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PREFACE 


There is urgent need for a comprehensive description of the tech- 
niques employed in educational research. There are a large number 
of texts dealing with statistical methods, especially the more ele- 
mentary ones, but statistical procedures represent only one group of 
the techniques of educational research. Among the techniques for 
which we have no adequate treatment, the need is probably most 
urgent for those relating to setting up and conducting experiments. 
Experimental research is a means for evaluating educational pro- 
cedures and, hence, occupies a position of importance. In general 
outline, the procedure is simple, but an analysis reveals its com- 
plexity. The idea of “controlled experimentation” is easy to com- 
prehend, but it is not easy to specify precisely what is involved in 
maintaining a control group. 

In this bulletin an attempt is made to describe in some detail 
the procedure of controlled experimentation, and on the basis of the 
requirements revealed, a small group of experiments is evaluated. 
The analysis of the factors affecting pupil achievement and the 
evaluation of the factors considered are largely subjective. An at- 
tempt was made to utilize the best data obtainable, but the supply 
is inadequate and in some cases the information is not highly depend- 
able. Consequently, both the analysis and the evaluation must be 
considered tentative and subject to revision in the light of future 
investigations. The writers, however, believe that they have suc- 
ceeded in showing controlled experimentation to be a highly complex 
and an intricate type of research, rather than one which can be car- 
ried out successfully by any novice who is sufficiently interested. 

The bulletin should be of interest to teachers, supervisors, and 
administrators, as well as to research workers. The latter will find 
it helpful as a guide in planning and conducting an experiment and 
in interpreting the results. To the others, it should give a set of eri- 
teria that may be used in evaluating the experimental investigations 
reported in our educational literature. 

The writers are glad to take this opportunity to express their in- 
debtedness to Dr. C. W. Odell for a careful reading of Chapter II 
and to Mr. T. T. Hamilton, Jr., for the editing of the entire manuscript. 


January, 1930. Watrter 8. Monroe 
Max D. ENGELHART 
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EXPERIMENTAL RESEARCH IN EDUCATION 
CHAPTER I 


INTRODUCTION 


The passing of speculation and authority. Until recently the typi- 
cal method of answering questions relative to education has been that 
of speculation, and the pronouncements of those recognized as authori- 
ties have been accepted generally as final; but history records a num- 
ber of attempts to solve thought questions in education by means of 
trial and observation of results. For example, Vittorino da Feltre 
(1378-1446) followed this procedure in devising methods of teaching 
that attracted much attention to his school, the Casa Gtocosa at 
Mantua.t Wolfgang von Ratke, or Ratich, (1571-1635) also at- 
tempted to prove the value of his method by actual trial in practice. 
The theories of Comenius and Rousseau found expression in prac- 
tice through the founding of the Philanthropinum at Dessau by 
Johann Bernhard Basedow (1723-1790).? Johann Heinrich Pestalozzi 
(1746-1827) put his educational theories into practice in his schools 
at Stanz, Burgdorf, and Yverdun.t Johann Friederich Herbart 
(1776-1841) was a firm believer in the value of experimental pro- 
cedure and inaugurated a practice school along with his pedagogical 
seminar at the University of Konigsberg.® 

The evaluation of pedagogical theory by trial in practice was the 
aim of several pioneer experimental schools in the United States. 
Among the most notable of these were the Oswego Primary Teachers 


1Woodward, W. H. Vittorino da Feltre and Other Humanist Educators. Cambridge, 
England: Cambridge University Press, 1905. 261 p. 

2Raumer, Karl von. Geschicte der Pddagogik. Giitersloh: Druck und Verlag von C. 
Bertelsmann, 1902, p. 27-29. L 

A briefer description of this “experiment’’ is given in: ‘ 

Graves, F. P. Great Educators of Three Centuries. New York: The Macmillan Company, 
1912, p. 20-26. 

3Raumer, op. cit., p. 212-52. ; 

Brief descriptions are given in: Graves, op. cit., p. 112-21. ‘ 

Monroe, Paul. A Textbook in the History of Education. New York: The Macmillan 
Company, 1929, p. 580-83. hares j ; 

4An account of his visit to Pestalozzi’s institution at Yverdun is to be found in: 

Raumer, op. cit., p. 340-59. ; 

Other descriptions of Pestalozzi’s work are to be found in: 

Barnard, Henry. Pestalozzi and his Educational System. Syracuse, New York: C. W: 
Bardeen Company, 1906. 751 p. 

Graves, op. cit., p. 122-66. 

Monroe, op. cit., p. 601-22. ’ : 

Parker, S. C. A Textbook in the History of Modern Elementary Education. Boston: 
Ginn and Company, 1912, p. 273-74. 

5For discussions of the work of Herbart see: ; é : 

Compayre, Gabriel. Herbart and Education by Instruction. New York: Thomas Y. 


Cc ell and Company, 1907. 142 p. ; 
ee Garmo, Charles: Herbart and the Herbartians. New York: Charles Scribner’s Sons, 
1896. 268 p. ; 
Graves, op. cit., p. 167-93. 
Monroe, op. cit., PD. 622-39. 
Parker, op. cit., p. 375-430. 
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Training School, with its model school for observation, established 
by Edward A. Sheldon in 1861;° the experimental school inaugurated 
by Francis W. Parker when he assumed the principalship of the 
Cook County Normal School in 1883; and the Laboratory School 
at the University of Chicago, established by John Dewey in 1896.* 


Early experimentation handicapped by inadequate conception of 
control of educative factors and by lack of instruments for measuring 
pupil material and pupil achievement. The pioneer experimentation 
in education failed to yield dependable results because of an inade- 
quate conception of control of educative factors. A single group of 
pupils was subjected to a complex of educative influences, including 
the novel procedure that was being tried, and after the close of the 
experiment, the results were ascribed, in many cases erroneously, 
to the novel procedure alone. A repetition with another group of 
pupils secured contrary results. This is well illustrated by the success 
of enthusiastic reformers who, in their own schools, showed an ap- 
parent superiority of their methods. Repetition by less enthusiastic 
schoolmen often failed to substantiate the contentions of the re- 
formers. 

A second handicap in these early experiments was the lack of 
instruments for measuring pupil material and pupil achievement. 
Measurement is fundamental to experimentation. The investigator 
must measure the original status of the pupils participating in the 
experiment, submit them to the experimental procedure, and measure 
them again. The pioneer experimenters were handicapped by their 
inability to secure quantitative measurements of the initial status of 
their pupils and of their final status after they had been subjected 
to the experimental procedure. 


The development of the concept of control of experimental condi- 
tions. The investigations of Rice, which were made between 1894 and 
1897, were transitional in the techniques used. The results obtained 
with one group of pupils were compared with the results secured 
from other groups of pupils. Comparison of results obtained by one 
procedure with results obtained by other procedures is a means of 


®6For a description of this school see: 

Autobiography of Edward Austin Sheldon. New York: Ives-Butler Company, 1911, p. 133-80. 

Dearborn, N, H. ‘The Oswego Movement in American Education,” ‘Teachers College 
Columbia University Contributions to Education, No. 188. New York: Bureau of Publications. 
Teachers College, Columbia University, 1925. 191 p. ; 

_ TRugg states, “.... he built up in the Cook County Normal School a faculty of ex- 
perimentalists, of fearless innovators, real students of childhood, and a practice school which 
proved an influential object lesson for both teachers and the general public.’ See: 

Rugg, H. O. “Curriculum-Making in Laboratory Schools,’ Twenty-Sizth Yearbook of 
the National Society for the Study of Education, Part I. Bloomington, Illinois: Public School 
Publishing Company, 1926, p. 87-91, 

Fe “Dewey, John. The School and Society. Chicago: University of Chicago Press, 1900. 

Ds 


A revised edition of 164 pages was published by the University of Chicago Press in 1915. 
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securing a measure of control of experimental conditions whose im- 
portance was recognized by Rice in the following statement: 


By a comparative study of results, even on a much narrower basis than I 
have indicated, a great deal might be accomplished in a very brief period toward 
the solution of the problem of methods? 


The influence of Rice is evident in the report of an experimental 
investigation of spelling by Cornman. This research had as its object 
the determination of the-relative effectiveness of formal instruction 
and incidental teaching in spelling. The results obtained in the two 
experimental schools were compared with those obtained in schools 
retaining the formal instruction.?° 

Prior to 1910 the use of control groups was most prevalent in 
learning experiments conducted by psychologists under laboratory 
conditions, but several notable experiments were carried out with 
the use of control groups under school conditions. Three may be 


mentioned from the field of transfer of training. 

Baauey, W. C. and Squirs, C. R. “Experiment on Transfer of Ideals of Neat- 
ness,’ performed in 1905 and reported in Bagley, W. C. Educational Values. 
New York: The Macmillan Company, 1911, p. 188-89. 

Ruepicer, W. C. “The Indirect Improvement of Mental Functions Thru 
Ideals,” Educational Review, 36:364-71, November, 1908. 

Wincu, W. H. “The Transference of Improvement in Memory in School 
Children,” British Journal of Psychology, 2:284-93, January, 1908; 3:386-405, 
December, 1910. 

The extent to which the use of control groups has been recognized 
by experimenters in education is indicated by the fact that control 
groups were employed in thirty-five out of seventy-two experimental 
investigations reported in the Journal of Educational Research from 
January, 1920 to June, 1927, and in seventeen out of twenty-six ex- 
periments reported as Teachers College Contributions to Education 
from 1918 to 1926.12 It is evident that this technique is almost uni- 
versally recognized as essential, even though a large proportion of 
contemporary experimenters fail to employ it. 

The development of instruments for measuring pupil material. 
The use of control groups, as an experimental technique, rests on the 
assumption that equivalent groups can be secured. In order that 
equivalence may be secured, it is essential to measure the pupils with 
respect to characteristics which influence learning in the experiment. 


°Rice, J. M. Scientific Management in Education. New York: Hinds, Noble and Eld- 
redge, 1914, p. 51. : 
The chapter from which 


ert Deed O. P. Spelling in the Elementary School: An Experimental and Statistical 


gat : Gi d Company, 1902, p. 59. : ; 
eens, ies pie LGN vores of Educational Research, 1918-1927," University of 


Illinois Bulletin, Vol. 25, No. 51, Bureau of Educational Research Bulletin No, 42. Urbana: 


University of Illinois, 1928, p. 79-80. 
2Jbid., p. 82. 


this quotation was taken was first published in The Forum for 
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Educational experimentation has acquired one of its most important 
tools in the development of tests to measure the chief of these charac- 
teristics—intelligence. The following paragraph briefly traces their 
development. 

The work of Galton (1869-.) and Cattell (1890- ) and other 
American psychologists on the differences in mental abilities of in- 
dividuals has been said to mark the beginning of modern intelligence 
testing.1* In 1905, Binet, in collaboration with Simon, published the 
first individual intelligence scale.** Intelligence testing became fairly 
common when Terman’s revision of the Binet-Simon Scale became 
generally available in 1916. In 1918 appeared the first group intelli- 
gence scale designed for school use, that of Otis,° and since 1918, 
group intelligence tests have been widely used in elementary and 
secondary schools, and to some extent in colleges and universities. 
It is estimated that seven to ten million are used annually at present.*® 
In 43 per cent of the learning experiments reported in the Journal of 
Educational Research from January, 1920 to December, 1928, in- 
telligence tests were used to measure pupil material for the purpose 
of securing equivalent groups. 

The development of instruments for measuring pupil achievement. 
For securing control groups that are equivalent to experimental groups 
in such an important characteristic as previous school achievement 
and for measuring the experimental achievement, valid and reliable 
instruments are essential. In 1908, Stone, under the direction of 
Thorndike, devised the first standardized achievement test.17 This 
was followed in the next few years by Courtis’ Arithmetic Tests, Series 
A (1909), Thorndike’s Handwriting Scale (1909), Hillegas’ Compo- 
sition Scale (1912), Buckingham’s Spelling Seale (1913), and Ayres’ 
Handwriting and Spelling Scales (1912-15).18 In more recent years 
there have been developed a multitude of achievement tests in almost 
all of the school subjects, both elementary and secondary, and to some 
extent in subjects of higher education. Some progress is being made 
at present in the development of measurements of character and per- 
sonality. It has been estimated that thirty to forty million standard- 
ized tests and scales are used annually, of which, three-fourths are 
tests of achievement.’ In 58.3 per cent of the learning experiments 


13Monroe, et al., op. cit., p. 89. 
9 


4Tbid., p. 90. 
Ibid., p. 98. 
WIbid., p. 114. 
MIbid., p. 90. 
Wibid., p. 91. 
Wbid., —p. 114. 
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reported in the Journal of Educational Research from J anuary, 1920 
to December, 1928, standardized achievement tests were used to 
measure pupil achievement for the purpose of evaluating the effect 
of the experimental procedure.?° 


Development of statistical techniques used in securing equivalent 
groups and in interpreting differences in gains in achievement. The 
theory of correlation, discovered by Galton about 18752: and ex- 
tended by Pearson, Yule, Spearman, and others,2? has enabled ex- 
perimenters to evaluate the validity and reliability of intelligence 
tests used to secure equivalence and of educational tests used to 
measure gains in achievement. Gauss, Encke, Quetelet, Galton, Pear- 
son, Sheppard, Yule, Spearman, Filon, and Kelley should be men- 
tioned for their work in the development of the statistics of errors. 
The error of a difference formula, particularly useful in the interpre- 
tation of differences in gains in experimentation, has evolved as a 
a result of the work of Encke, Airy, Sheppard, and Yule.2? The 
suggestion of the “experimental coefficient” by McCall in 1923 has 
provided experimental workers in education with a criterion for testing 
the statistical significance of a difference.2* It would be possible 
to mention many other statistical devices that have been developed 
in recent years and that are of service in educational experimentation. 


Development of educational tests accompanied by interest in ex- 
perimentation under school conditions. The development of educa- 
tional tests was accompanied by increasing interest in experimen- 
tation under school conditions. Leaders in the field of education 
stimulated this interest by speeches at educational meetings and by 
editorials in educational journals. The following quotation from an 
editorial in the first number of the Journal of Educational Psychology 


is characteristic of these utterances. 

Educational practice is still very largely based on opinion and hypothesis, 
and thus will it continue until competent workers in large numbers are enlisted 
in the application of the experimental method to educational problems. Little 
more than a beginning has been made in this important movement.” 


2Tt is, of course, not essential that an achievement test be standardized for it to be 
suitable for use in an experiment. Standardized tests are usually better constructed than 
tests made informally, and as such, are better measures of achievement. See: 

Odell, C. W. Traditional Examinations and New-Type Tests. New York: The Century 

, 1928, p. 21. ; : : 

gs re ae Drath de: Plana, Gauss, and Bravais had developed some ideas of correlation 
before Galton, but the first clear statement oF the theory and the first use of the term ‘“‘cor- 

ion” in 1888 must be credited to him. See: 5 ha ‘ al 
So Orie HH: M. S Shudics in the History of Statistical Method. Baltimore: The Williams 
and Wilkins Company, 1929, p. 92-106. 

2Ibid., p. 107-41. 

237Tbid., p. eifos 

*4Tbid., p. 180. : o 

The iden, of such a ratio was first developed by De Moivre, Kramp, and McGaughy. 

Journal of Educational Psychology, 1:2, January, 1910. (An editorial.) 
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In an article published in this same issue of the journal, which has 
since published more learning experiments than any other periodical, 
Thorndike states: 


Schoolroom life itself is a vast laboratory in which are made thousands of 
experiments of the utmost interest to “pure” psychology... . . Experts in 
education studying the responses to school situations for the sake of practical 
control will advance knowledge not only of the mind as a learner under school 
conditions but also of the mind for every point of view.” 


Dearborn urged the repetition of laboratory experiments under 
school conditions. His emphasis on the use of appropriate techniques 
and his plea for careful work, coming as it did in 1911 long before 
‘mass production” in educational research had been reached, should 
not fail to be noted. The following quotation illustrates the character 
of Dearborn’s pleading: 

If this is to be a serious school experiment, practice should be carried out 
for months at a time, and longer. The entire subject may be dropped for a year 
or more from the work of one class or group and carried on with regular and 
persistent practice in a comparable group. Such an arrangement of the work in 
the early years of the elementary school in view of the importance of the ex- 


periment and in view of the possible flexibility of the elementary-school course 
would not be an undue interference with the work of the school.” 


In 1914, Whipple urged that learning experiments under labora- 
tory conditions and using adults as subjects be repeated under school 
conditions with children as the subjects. Such a statement as the 
following could not help but stimulate research workers in the field 
of education to undertake investigations of the type advocated. 


. I believe that in one important phase of experimental work—that 
dealing with the effects of practice and its spread or transfer—experimentation 
with children has been somewhat neglected, and that most of the conclusions 
now current upon the nature of formal discipline have been based upon observa- 
tions carried on with adults. . . . . The whole problem of practice might be 
recanvassed to advantage with children working under classroom conditions.® 


A sublime faith in the value of experimentation in the solution 
of educational problems is expressed in the following quotations: 


Now comes the experimentalist, and with clear, unfaltering eye and steady, 
relentless tone, he demands of each subject the justification for its existence” 

Everywhere there are evidences of an increasing tendency to evaluate educa- 
tional procedures experimentally. . . . . Scientific organizations, research com- 
mittees, an institute of educational research, and large educational foundations 
are lending such impetus as make experimental education the most important 
current movement in education.” 


**Thorndike, E. L. “The Contribution of Psychology to Education,” Journal of Educational 
Beveloiogy, bl2, January, 1910. 
earborn, W. F. ‘‘Experimental Education,’ -School Review Mono h No. 1. i : 
pe ia of oe Press, 1911, p. 10. iy oa ata sete 
: ipple, G. M. ‘Applicability to Children Secured with Adults,” J l - 
ee eae beeen ee (An editorial.) tise hs ce 
ell, J. ale ae ew umanism Needed,’ Journal of Educational : 
March, 1918. (An editorial.) Rapa wae Ag eb SERS 
McCall, W. A. How to Experiment in Education. New York: The Macmillan Com- 
pany, 1923, p. 2. 
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It is to the experimental method that education must look for the solution 
of many of its most vexing problems. It is upon this basis that the ultimate 
establishment of education as a science must rest.” 

Thus within a relatively short period, controlled experimentation 
reached the highest vogue in the repertoire of research workers in 
education. Through the stimulation of leaders in. this field, the multi- 
tude engaged in educational experimentation. The following para- 
graphs portray the awakening of a few to the limitations of present 
experimental method even with perfected techniques. 


Recent criticism of the experimental method in education. In 
recent years the feeling has arisen on the part of some leaders in the 
field of education that educational experimentation, as it is carried on 
at present, is largely futile. 

The need for a program of research in teaching becomes more apparent when 
the nature of the so-called “scientific investigations” in that field is considered. 
In general, many of the investigations are too limited in duration, involve too 
few subjects, and are too crudely done to warrant satisfactory conclusions. The 
topics investigated are unrelated, and many of those attempting research have 
not been properly trained for such work.” 


A survey of the learning experiments reported in the Journal of 
Educational Research, Journal of Educational Psychology, and the 
Teachers College, Columbia University Contributions to Education 
during the period 1918-27 provided the data on which the following 


conclusion is based. 

Although no systematic survey has been made, it appears that the permanent 
accomplishments of educational research during this period are much less than 
the quantity of production would lead one to expect. This is especially true of 
experimental studies.” 

Henmon, after three years work with the Modern Foreign Lan- 


guage Study in the production of tests and in the setting up of con- 
trolled experiments, reflects as follows on the possibilities and diffi- 


culties of experimentation: 

We teach our students to be scornful of tradition and mere observation and 
insist that all things must be subjected to the test of controlled experimentation. 

This is undoubtedly a healthy attitude to take if education is to become a 
science but the constant reader of the present day educational literature cannot 
in his critical moments help but be troubled by the imperfections and ambiguities 
of our measurements and the inconclusiveness of our sporadic experiments. When, 
for example, on such an important problem for educational theory and practice 
as the effect of equal practice on individual differences, whether equal practice 
increases or decreases them, we find out of twenty-four experimental studies 
twelve of them leading at least tentatively to the conclusion that differences are 
increased and twelve to the conclusion that differences are decreased, we cannot 


31Go0d, C. V. How to do Research in Education. Baltimore: Warwick and York, 1928, 


Rares Clifford. ‘The Values of Educational Research to the Classroom Teacher,” 


ional Research, 16:175, October, 1927. % . : 
TOIT ee sear et “a “Ten Years of Educational Research, 1918-1927,’’ University of 


Illinois Bulletin, Vol. 25, No. 51, Bureau of Educational Research Bulletin No. 42. Urbana: 
University of Illinois, 1928, p. 84. 
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help wondering about our experiments and about the conclusions derived from 
them.™ 


The following quotations show that the feeling of distrust for 
the results of educational experiments reported in the literature is 
not restricted to the men quoted above. 


Or the investigator: gives a few standard tests; he finds the pupils very 
deficient. He calls the teachers together; he arouses great enthusiasm, doubles 
the time to be given to the subject, introduces an entirely different method, 
works up a high degree of skill in the use of it, and after a few months “con- 
cludes that the new method was alone responsible for the improvement observed. 
Everybody should at once follow suit.* 

Perhaps the extreme case is that of the examination and treatment of a 
fourth-grade pupil, found to be deficient in reading. After a brief diagnosis and 
application of “remedial measures,’ the announcement is gravely made that in 
the light of this experience we may safely assume that the proper method of 
dealing with all fourth-grade pupils having similar disabilities is that used in 
this case. Making a sweeping generalization on the basis of a single instance 
would seem to exhaust the possibilities of the scientific method in education and 
leave nothing to be desired in the way of economy, efficiency, and dispatch. 
Many of the “conclusions” appended to recent “scientific” investigations have 
little more to support them. We are in a fair way to be able to prove anything. 
A few figures and a graph will turn the trick.® 

We have observed in many of the practices of educational research workers 
a tendency to shallowness. We have taken occasion to point out more than once 
a lack of sustained effort, a willingness to flit from one thing to another, and an 
unwillingness to stay with a problem until fundamental—the word seems to 
haunt us—until fundamental results are secured .... We are threatened with 
becoming mere dabblers in research, foolishly confident of the virtues of a fresh 
start.” 

Another line of inquiry has to do with the operations of the classroom. 
Some of the most influential investigations made in recent years have had to do 
with the problems of classroom procedure, and yet anyone who contrasts the 
facts which appear during observation of a good teacher and the recommenda- 
tions made in even our best textbooks on methods knows that the scientific 
description of teaching is in its infancy.® 

We must use greater care to make certain that the conclusions we state in 
our reports follow logically from the data presented. Too many reports state 
conclusions that are not fully supported by the research data included in them. 
This association should interest itself in the quality as well as in the quantity of 
educational research.” 

Nevertheless, I can not evade the conviction that, relatively speaking, the 
published research in education is, om the whole, inferior in quality, and more 
especially inferior in ultimate significance, to the published research in other 
branches of scientific endeavor. Too many contributions seem essentially futile. 


°4Henmon, V.A.C. ‘‘Measurement and Experimentati i ati De 
bf Hiucationt Research 18 185-196 October) 1908! raider caer ay SO 
%Assuming the Major Premise,” Journal of Educational Method, 2:229, February, 1923. 
(An editorial.) 
%6Loc. cit. 
aie eee in Research,” Journal of Educational Research, 9:331, April, 1924. (An 
editorial. 
8Judd, C. H. ‘Research in Element ion,”’ i 
Pe, entary Education,’ Journal of Educational Psychology, 
, cae M. R. “Educational Research in 1925,’’ Journal of Educational Research, 13 :344, 
ay, : 
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After you read them, you feel like saying: “Well, suppose it is true; what 
Oleitter > 

It is easily charged, and must be admitted, that initial effort to apply ex- 
perimental techniques to the intricate problems of human affairs is-often a lame 
and halting procedure, and far too much may easily be claimed by way of fact 
and inference as forthcoming from first efforts in this direction.” 


Educational experimentation in a plateau period. The previous 
discussion has traced the past of educational experimentation. It 
has been shown that this method of answering thought questions in 
education has undergone an evolutionary development over a period 
of some centuries. The contributions of laboratory experimentation 
in the field of psychology and the aid rendered by the production of 
more suitable measuring instruments have been mentioned. Finally, 
some indication was given of the effect of the writings of prominent 
leaders in the field. These writings, stimulating and optimistic for 
the most part a few years ago, have been replaced by others reflecting 
disillusionment and an attitude of distrust for this method of research 
in education. However, the feeling seems to be that the fault lies 
not with the fundamental theory of experimentation nor with the 
difficulties involved when human beings are the subjects of experi- 
ment, but with our present experimental techniques. That is to say, 
there is a feeling that the mediocre quality of experimental results has 
been due to the lack of adequate techniques and to the belief that 
conclusive results will be secured when techniques are perfected. If 
this is true, possibly an analogy may be drawn with plateaus in learn- 
ing. The lower order “habits” have been formed; improvement is at 
a standstill until higher order “habits” have been perfected. 


Definition of experiment. The foregoing discussion has been 
given before defining the term “experiment,” since the concept repre- 
sented by this term has undergone an evolution analogous to the 
historical development of the method itself. The concept of “educa- 
tional experimentation” that is expressed in the following paragraphs 
is the culmination of this development; it is, therefore, appropriately 
given at this time. 

A child’s achievement from a period of learning is the resultant 
of several educative factors. “Experimentation” is the name given 
to the type of educational research in which the investigator controls 
the educative factors to which a child or group of children is sub- 
jected during the period of inquiry and observes the resulting achieve- 


49Whipple, G. M. ‘The Improvement of Educational Research,” School and Society, 


: 7, 1927. ae, ; 
ct oes M. E. “The Scholarly Study of College Education,” Journal of Educational 


Research, 19:140, February, 1929. (An editorial.) 
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ment. The meaning of this definition will be clearer if consideration 
is given to some of the procedures employed. In the simplest type 
of educational experiment the investigator seeks to evaluate the in- 
fluence of some one educative or “experimental” factor on a single 
eroup of children. He must start the experiment with some measure- 
ment of the initial attainment of the children in the trait or ability 
to be influenced. He then subjects the group to the experimental fac- 
tor, such as a particular type of drill material in arithmetic, for the 
duration of the experiment. At the end, the investigator applies a 
final test for the purpose of determining the gain in achievement that 
has resulted from the application of the experimental factor. This 
simple type of experiment may be illustrated by describing briefly 
one reported by Glick. 

This experiment had as its problem the determination of the effect 
of practice on intelligence test scores.*? Students were tested at the 
start of the experiment with one of the forms of the Army Alpha 
Intelligence Examination. The experimental factor consisted of prac- 
tice exercises similar to, but not identical with, the exercises in the 
sub-tests of the intelligence examination. After certain intervals 
other forms of the Army Alpha were administered to these same 
students. The increase in scores from one application of the intelli- 
gence examination to another is a measure of the effect of the ex- 
perimental factor operating over the interval of practice. 

A single group experiment, such as the one just described, is ap- 
propriate when it is evident that the effect is to be ascribed to the 
operation of only one educative factor. In many cases, however, such 
a situation does not exist. Instead of being able to have the subjects 
influenced by one factor, they are influenced by many. If one were 
to use a single group of individuals, it would be impossible to say 
how much of the effect was due to any particular cause. When two 
or more groups are used, it is possible to subject them to identical 
conditions with the exception of the experimental factor. The dif- 
ference in the effect when one group is compared with the other may 
be ascribed to the operation of this single factor. This method may 
be illustrated by describing an experiment by Anibel. 

The problem of this investigation was the determination of the 
comparative effectiveness of the lecture-demonstration and individual- 
laboratory methods in chemistry.** Anibel set up two groups pre- 

“Glick, H. N. “Effect of Practice on Intelligence Tests,” University of Illinois Bulletin, 


Vol. 23, No. 3, Bureau of Educational Research Bulletin No. 27, Urbana: University of Illi- 
nois, 1925, p. 6. 


*#Anibel, F. G. ‘Comparative Effectiveness of the Lecture-Demonstrati d ivi 
Laboratory Method,” Journal of Educational Research, 13:355-65, May, 1926, Seg ene 
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sumably equivalent in intelligence as follows: “A student from the 
lecture-demonstration or test group was’ paired, for the purpose of 
comparing achievement records, with a student from the individual- 
laboratory class or control group.’’** The investigator sought to keep 
certain educative factors identical in both of these groups. He states, 
“all classes in chemistry met five times per week for forty-five minute 
periods .... The classroom instruction was identical for the two 
groups, thus equalizing any factors that might be present in class- 
room instruction.’”*° After getting these educative factors under con- 
trol, it was possible for him to use different instructional procedures 
in the laboratory instruction of the groups. These instructional pro- 
cedures were demonstrations of chemical experiments before one 
group of the pupils, while the pupils of the other group were required 
to perform the experiments for themselves. The difference between 
the two groups in gains in achievement is to be ascribed, with certain 
limitations, to the superiority of one of these instructional procedures 
over the other. 

The problem of this investigation. The previous discussion has 
indicated to the reader that educational experimentation has under- 
gone a long period of development. In the last twenty years, hundreds 
of learning experiments have been performed under school condi- 
tions. Early enthusiasm has been replaced to some extent by ex- 
pressions of distrust. Hence, there appears to be a need for a critical 
analysis of experimentation as a procedure in educational research. 
In the chapters that follow, the present writers attempt: 

1. To describe in detail the procedure that should be followed in 
educational experimentation to arrive at dependable conclusions. 

2. To apply the procedure outlined as a means of evaluating a 
group of experiments. 

3. To formulate an appraisal of the present status of experimen- 
tation as a procedure in educational research. 


44Anibel, op. cit., p. 356. 
5Ibid., p. 356. 


CHAPTER II 


THE REQUIREMENTS FOR CONTROLLED GROUP 
EXPERIMENTATION 


The general plan of controlled group experimentation. In a con- 
trolled experiment there are two groups of pupils which are equivalent 
in all respects that affect learning in the field of experimentation. The 
instruction and other educative influences to which the two groups are 
subjected are the same except for one factor. This experimental factor 
may be an instructional technique, the size of the class, the textbook, 
or any other educative influence that may be studied experimentally. 
The difference in the gains in achievement made by the two groups 
during the period of experimentation is an index of the relative merits 
of the two forms of the experimental factor.t This plan may be de- 
scribed more formally as follows: 

Let E,—mean initial status of experimental group in the 
abilities that the application of the experimental 
factor is expected to affect. - 

C, mean initial status of control group in the same 
abilities. 

E, = mean final status of experimental group in the abili- 
ties that the application of the experimental factor 
is expected to have affected. 

C, = mean final status of the control group in the same 
abilities. 

HK, — E, = Gain E 

C, — C, = Gain C 

The “difference in gain,” D, equals the result found when Gain C 
is subtracted from Gain E. If this difference is positive, the status of 
the experimental factor prevailing in the experimental group repre- 
sents the more effective instructional conditions. If the difference is 
negative, the opposite conclusion is indicated. The validity of this 
interpretation depends upon the satisfaction of three requirements: 
(1) The two groups of pupils are equivalent at the start of the experi- 
ment. (2) All educative factors except the experimental one are the 
same for both groups. (3) The measures of achievement from which 
the gains are computed are both valid and accurate. When any one 

*An educational experiment may involve more than two groups of pupils and may be 


more complex in other respects, but the following discussion assumes the simple plan described 
here. Later, attention will be given to the procedure of the more complex experiments. 
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of these requirements is not fully realized, it becomes necessary to 
discount the difference in the gains made by the two groups. If the 
difference is small, and if the departure from the requirement is large, 
the relative merits of the two procedures compared will not have been 
determined. 


Questions considered in this chapter. In this chapter the follow- 
ing questions are considered: 

1. What is required to secure equivalent groups of pupils? 

2. What are the important educative factors that affect the 
achievements of pupils? 

3. What is involved in controlling the important educative factors 
that affect pupil achievements? 

The analysis of the causes that affect achievement and the determi- 
nation of the important educative factors is, of course, only tentative. 
Although there are a number of causal investigations in which an 
attempt has been made to determine the contributions of certain fac- 
tors to achievement and their relative potency, the available evidence 
is fragmentary, and there is reason to doubt the validity of the find- 
ings, at least in a number of cases.?, Consideration of experimentation 
as a research procedure, however, requires that an attempt be made 
to identify the more important educative factors. In doing this, the 
present writers have endeavored to make use of the best data obtain- 
able, but they are not unmindful of the fact that in this case the best 
data may not be sufficiently valid to accomplish the desired result. 
Consequently, the conclusions presented in the following pages relating 
to the factors to be considered in educational experimentation should 
be thought of as tentative and subject to modification when more 
dependable data are available. 

The significant characteristics of pupil material. In listing the 
significant characteristics of pupil material, we are concerned only 
with those that affect achievement in the field of experimentation. 
Obviously, such characteristics as color of hair, degree of beauty, and 
height, do not belong in the list. On the other hand, general intelli- 
gence and previous achievement in the field of experimentation must 
be included. The following characteristics appear to deserve con- 
sideration: 

1. General intelligence in terms of point scores, or of mental age 
2. Chronological age 
3. Previous achievement in the field of experimentation 


2Burks, B. S. “On the Inadequacy of the Partial and Multiple Correlation Technique,’”’ 
TOM lof Educational Psychology, 17 :532-40, 625-30, November, December, 1926. 
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4, Study habits 

5. Personality traits (attitudes, ideals, and interests) 
6. Physical condition (health) 

7. Sex 

8. Race 


1. There is abundant evidence that general intelligence, as meas- 
ured by typical intelligence tests, influences the achievement of chil- 
dren. Many investigators have concluded that it is the most impor- 
tant factor. The following conclusion from the report of a recent 
investigation by Heilman is indicative of this belief: 


Our results also appear to show that under the prevailing conditions of the 
home and school organization, intellectual endowment, or whatever is measured 
by the Stanford-Binet test, has by far the most powerful influence in determining 
differences in achievement in the traditional curriculum. It is not unlikely that 
a similar statement could be made for achievement in general.’ 


This may not be true in the case of some pupils, but the general 
statement appears to be justified. Hence, general intelligence (mental 
age, or test scores) may be placed at the head of the list of significant 
characteristics of pupil material. 

2. The significance of chronological age* becomes apparent when 
a child having a mental age of twelve years and a chronological age of 
ten years is compared with one whose corresponding ages are twelve 
and fifteen. The first child has an I. Q. of 120 and the second one, an 
I.Q. of 80. Although the two children have equivalent mental ages, 
the first one is “bright” and the second is “dull.” The significance of 
chronological age is further shown by a comparison of two children of 
the same I. Q. but of different chronological ages. Although the chil- 
dren are equally “bright,” the difference in mental ages, as well as the 
differences in physiological and social maturity, emphasizes the im-. 
portance of chronological age as a factor in school achievement. The 
importance of chronological age as a factor in school achievement is 
recognized by those who recommend homogeneous grouping on the two 
bases, mental age and chronological age.® An excellent discussion of 
the influence of chronological age, or the maturity of which it is an 
index, is to be found in a recent monograph by Commins.® 


‘Heilman, J. D. “Factors Determining Achievement and Grade Location,” The Pedagogi- 
cal Seminary and Journal of Genetic Psychology, 36:454, September, 1929. 

a a comprehensive account of the influence of general intelligence upon school achieve- 
ment, see: 

Terman, L. M., e¢ al. “Nature and Nurture, Their Influence Upon Achievement,’ Twenty- 
Seventh Yearbook of the National Society for the Study of Education, Part II. Bloomington 
Illinois: Public School Publishing Company, 1928. 397 p. 

‘The I. Q. might have been listed as a pupil characteristic instead of chronological age. 
The measurement of the latter is objective; hence it is to be preferred. When chronological 
age is included, the I. Q. is superfluous. 

"Freeman, F. N. Mental Tests. Boston: Houghton Mifflin Company, 1926, Dp. 23. 

q ‘Commins, W. D. “Maturity and _ Education,’ Educational Research Bulletin, Vol. 3, 
No. 7, Catholic University of America, Washington: Catholic University Press, 1928, p. 36. 
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3. Previous achievement’? is a significant characteristic of the 
pupil material when it functions as a prerequisite for the learning 
involved in the experiment. For example, ability to read functions 
as a tool in learning arithmetic, geography, history, literature, and 
the like.’ Certain abilities in arithmetic and algebra function as tools 
in the study of chemistry, and achievement in chemistry contributes 
to achievement in physics. Achievement in the first year of a foreign 
language functions as a tool in the more advanced study of that 
language. It would be easy to enumerate a large number of cases 
in which abilities engendered in a school subject function later in the 
learning of that subject or related subjects. 

Abilities that function as a prerequisite for learning in one school 
subject may, or may not, be significant for learning in another school 
subject. For example, achievement in the first year of a given foreign 
language would be of more significance in an experiment in the second 
year of that language than it would be in an experiment in a different 
language. Achievement in the first year of a foreign language would 
probably be of least significance in an experiment that involved type- 
writing. The previous achievement of children becomes of increasing 
importance as a factor in the achievement of the experiment in pro- 
portion to the extent to which the children have experienced subject- 
matter similar in content to that of the experiment. 

4. The term study habits is used to designate a somewhat indefi- 
nite group of procedures employed in doing assignments. Their 
general nature is indicated by samples of the rules proposed by 
Whipple.® 

a. When possible, prepare the advance assignment in a given 

subject directly after the day’s recitation in it. 

b. Form a time-study habit. 
Form a place-study habit. 
Don’t stop work when you have just barely learned the ma- 

terial, but keep on until you have over-learned it. 

Begin work promptly. 
Train yourself to ignore distractions from without. 
Do your work with the intent to learn and to remember. 
Mentally review every paragraph as soon as you have read it. 


ep 


5’ 0a rh © 


F : sific 
TThe total outcome of learning includes general patterns of conduct as well as_ speci 
habits ae knowledge. Among the possible outcomes are study habits which are not included 


here under the head of ‘‘previous achievement.” 


8For a discussion of the contribution of achievement in reading to achievement in arith- 


Dien pene W. BE. “Reading Difficulties in Arithmetical Computation,’ Journal of Edu- 


Lessenger, ? 
cational Research, 11:287-91, April, 1925. Puplicnechoolebube 


*Whipple, G. M. How to Study Effectively. Bloomington, Illinois: 
lishing Company, 1927. 96 p. (Revised edition). 
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It is evident from an examination of this list that study habits 
vary widely in specificity and in value. Habits with respect to the 
time and place in which study is carried on are far more specific in 
nature than those of mentally reviewing a paragraph or studying 
with the intent to remember. Mentally reviewing a paragraph im- 
plies organization of knowledge and as such may be classed as a 
method of thinking. Studying with the intent to remember is an 
attitude toward study that may function in all study situations. While 
the precise effect of conforming to recommended study habits is not 
known, it is probable that their value is a function of the degree of 
their generalization. The more specific study habits may, or may not, 
be useful since the brighter students can get along without them. The 
more general study habits are indispensable since they are the 
methods and the driving forces of reflective thinking. The following 
quotations tend to substantiate the above contentions with respect to 
the importance of this factor in learning activity: 

When the pupil had acquired effective methods of study and observed that 
he really could learn, a new and happy interest was the common result. Some 
of the pupils, for example, began to read books, a thing that had never previ- 
ously been done because reading was difficult work.® 

During the latter half of the period, the methods used in the preparation of 
actual assignments were given special attention. Certain of the students made 
noticeable progress; one sophomore made his first “A” since entering high school 
while receiving training in ancient history, a freshman showed a decided gain in 
algebra.” 

The work of the class was greatly improved through the use of better meth- 
ods of study. The pupils became more independent and more alert to the im- 
portance of the history topics and their relation to our lives today.” 

.... that when reasonably effective methods are used to control admission 
to college, the failure of students subsequently is not commonly due to inade- 
quate intelligence; that, on the contrary, the failures are mainly due to several 
factors, among which, according to the reports gained from the students them- 
selves, a prominent place is to be assigned to neglect of proper instructions in 
the art of study.” 

What is still more significant, perhaps, is the fact that 87 per cent of the 
freshman students enrolled in these two “How to Study” sections completed 
their total enrolment in the university in a way that was satisfactory to all their 
instructors, while only about half of the freshman students entering the uni- 


versity last year, (46 per cent of the boys and 63 per cent of the girls) completed 
their total enrolment in a satisfactory manner.* 


Gates, A. I. “A Study of Reading and Spelling With Special Reference to Disability,” 
Journal of Educational Research, 6:20, June, 1922. 

“Monroe, W. S. and Mohlman, D. K. ‘Training in the Technique of Study,’’ University 
of Illinois Bulletin, Vol. 22, No. 2, Bureau of Educational Research Bulletin No. 20. Urbana: 
University of Illinois, 1924, p. 20. 


YHisk, E. M. ‘An Experiment in How to Study,’ Elementary School Journal, 27 :138, 


4Book, W. F. ‘Results Obtained in a Special ‘How to Study’ Cou Gi 
Students,’ School and Society, 26:534, October 22, 1927. : saciid aig ae 
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On the same level of intelligence the methods of study are of great impor- 
tance. As a rule the students of low intelligence who were successful in college 
were employing good study technique.” 


Symonds, in concluding an excellent review of the research on 
“How to Study,” is probably correct in stating “ ... . that the com- 
monly accepted rules of study are often non-consequential,’!® but 
his later statement, “While one would not deny the fact that all these 
rules are factors in efficient study one may question their relative 
importance,”?? would cause one to believe that some study habits are 
important factors in learning activity. While further experimentation 
is necessary before it may be known definitely which of the more 
specific study habits are most effective and, therefore, most important 
to the experimenter, it may be safely stated that the status of the 
pupils with respect to study habits, particularly the more general ones, 
should be considered by the experimenter in forming equivalent 
groups. 

5. The term personality traits is used to designate a group of atti- 
tudes, interests; ideals, and other reaction tendencies. This group has 
not been fully analyzed, but several traits have been identified that 
appear to influence pupil achievement. After canvassing the available 
literature, Herriott listed five attitudes: 

a. Ambitious — Indifferent 

b. Cheerful — Despondent 

c. Evaluative — Non-evaluative 
d. Persevering — Vascillating 

e. Self-confident — Dependent*® 

His investigation to determine the importance of these attitudes as 
factors of scholastic success indicated that the last three are major 
factors. The third and fourth are related to scholastic success in a 
positive way. That is to say, the student who has the attitudes of 
evaluating and persevering is more successful, in general, than the stu- 
dent whose behavior is characterized by their opposites, non-evaluat- 
ive and vacillating. The fifth is related to success in a negative way. 
Self-confidence is apt to be dangerous as an attitude of cocksureness, 


Ross, C. C. and Klise, N. M. ‘Study Methods of College Students in Relation to In- 
telligence and Achievement,’ Educational Administration and Supervision, 13:562, November, 


27. . Deh y: . 
eG 16Symonds, P. M. ‘Methods of Investigation of Study Habits,’’ School and Society, 24:151, 


July 31, 1926. 


MWTbid., p. 152. E - ; ; or. 
ate ecrlolt, M. BE. “Attitudes as Factors of Scholastic Success,” University of Illinois Bul- 


letin, Vol. 27, No. 2, Bureau of Educational Research Bulletin No. 47. Urbana: University of 


inois, 1929, p. 31. : ; 
TP ae Vals des used to designate an attitude represent opposite extremes of what Her- 


i Ils a “single attitude.” Thus the pupil who is “‘ambitious’” and the one who 13) Ui 
diferent?” are ered exhibiting extreme differences in the same attitude rather than two different 


attitudes. 
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or the instructor is likely to favor an attitude of dependence on him- 
self and the text.1® Herriott concludes with the following statement 
relative to the significance of these traits: “These data support the 
belief expressed by many authorities that traits such as study habits 
and attitudes are factors of success comparable to the seemingly more 
tangible and more usually measured factors such as intelligence and 
previous preparation.””° 

Statements by several other investigators are reproduced as indica- 
tive of the recognition of the importance of personality traits as fac- 
tors in learning: 

“Without doubt, some of the backwardness was due to a lack of interest and 
(NOIR; Gg oo 

It is possible to obtain statements of personality traits (moral attitudes, 
emotional maladjustments, and interests) which give correlations of very appre- 
ciable size (about as large as those obtained between tests of intelligence and 
marks) with academic success.” 

The most significant factor next to estimated intelligence in its association 


with scholarship appeared to be the quality, or composite of qualities, defined 
as school attitude.* 

It would appear from all available data that the relationship between educa- 
tional interests and abilities as expressed in school grades is represented by aver- 
age correlations between + .20 and + .40.* 

The major groups of causes of scholastic deficiency were found to appear in 
the following order of significance.” 


Significance 
scores 
Motivation an Gm interestsweracscrectietee hierar iet iene Tem ieee 265 
Tntellectwal tactorss.c..' Sh-ts.creeseise o Wanton: ver ere OE ae tere eee 265 
Mmotion al tiaeborsins « ae ccsca were oe eRe te ee ner ee Re etre trees 221 
Hiduca tional) faetorsincccscs +c scctoeemec ete ccteit ce arc neta ener eee 202 
Hnvaronmentall Tactorss comcast riemireei tie omicre once eee 148 
Study shabitsrand! mvethOodSoerrestartertteter te eieterreteheiie reer reeereeeete 113 
Phrysicall sfaietors: ares tse stent ncce-aerererace teeters terion Rote eee nee ee toe 90 
heachinesmethodssand. COmbeM teeter rise secrete ire crieree ie tea teen eae 32 
Motors factors tintin satan Ste Toe ee ORG Or een eee 28 


The data give a minus third-order correlation between general health and 
school marks, and a relatively low correlation between preparation and marks; 
the high correlation of “school attitude” with marks is the striking feature of 
the situation.” 


Herriott, op. cit., p. 42-43. 

20Tbid., p. 44. 
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Chambers, O. R. “‘Measurement of Personality Traits,’ Research Adventures in Uni it 
Teaching. Bloomington, Mlinois: Public School Publishing Company, 1927, p. 76. ie pepe 

8Fleming, C. W. “A. Detailed Analysis of Achievement in the High School,’ Teachers 
College, Columbia University Contributions to Education, No. 196. New York: Bureau of 
ae tae peschers core Columbia University, 1925, p. 185. 
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In the light of these investigations and several others that might 
be mentioned, it appears that personality traits form a significant 
characteristic of pupils when considered as learners in school subjects. 

6. Severe illness and certain types of physical defects, such as 
blindness, or deafness, are handicaps in learning, but the significance 
of all aspects of a pupil’s physical condition is not known. That such 
physical defects as adenoids, enlarged tonsils, deafness, and poor vision 
are factors in school achievement is indicated in the following quo- 
tations: 

; In every case, except in that of vision, the children rated as “dull” are found 
to be suffering from physical defects to greater degree than the “normal” or 

“bright” children.” 
No evidence is apparent that the good or bad condition of the tonsils had 


any effect on intelligence, but those children who showed improvement in con- 
dition of tonsils had the highest rate in school achievement.® 


The conclusions of Sandwick,?? Hall and Crosby,?° Sumner,?! and 
Mallory*? concur with the above in emphasizing the importance of 
physical condition as a factor in school achievement. The conclusions 
of two recent investigations are not in harmony with those just given, 
since the claim is made that physical defects are a minor factor in 


scholastic success: 


Physical defects were not much more prevalent among the retarded group 
than among the normally progressing group, and therefore, nonpromotion could 
not be attributed to that cause.* 

It is obvious, of course, that very serious defects will handicap a child in 
learning. Their influence is probably both direct and indirect. But lesser de- 
fects do not appear to have any causal connection with poor scholarship. In 
fact no association of any kind appears in these data between physical health 
and achievement. Even comparatively serious defects do not necessarily entail 
poor achievement. 


The writers are inclined to favor the view expressed in the con- 
clusion just quoted from Westenberger. The experimenter is justified 
in considering physical condition, so far as the experiment is con- 
cerned, a minor factor in learning activity. Extreme cases of ill 


2tAyres, L. P. Laggards in Our Schools. New York: The Russell Sage Foundation, 1909, 
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28Hoefer, Carolyn and Hardy, M. C. “The Influence of Improvement in Physical Con- 
dition on Intelligence and Educational Achievement,” Twenty-Seventh Yearbook of the Na- 
tional Society for the Study of Education, Part I. Bloomington, Illinois: Public School Pub- 


lishi Company, 1928, p. 387. } 
“ "SSandwick, R. L. “Correlation of Physical Health and Mental Efficiency,’ Journal of 


E tional Research, 1:199-203, March, 1920. ; . 

bitty ine, and Crosby, Amy. “A Study of the Causes of Inferior Scholarship of Pupils 
in Low First Grade,” Journal of Educational Research, 14:375-83, December, 1926. 

31§umner, H. W. ‘‘Health and Home Factors in Non-Promotions,”’ Chicago School Journal, 


2101-103, ember, 1926. : : P 
: reinistion? N. “A Study of the Relation of Some Physical Defects to Achievement in 


the Elementary School,’ George Peabody College for Teachers, Contributions to Education, 
No. 9. Nashville: George Peabody College for Teachers, 1922. 78 p. 
338talnaker, EH. M. and Roller, R. D., Jr. “A Study of One Hundred Nonpromoted 
Children,” Journal of Educational Research, 16:270, November, 1927. ; 
34Westenberger, E. J. ‘A Study of the Influence of Physical Defects upon Intelligence 
and achievement,” The Catholic University of America, Educational Research Bulletin, Vol. 2, 
No. 9. Washington: The Catholic Education Press, 1927, p. 45. 
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health, or physical defect, tend to eliminate themselves from the ordi- 
nary schoolroom. It is probable that both groups will contain approxi- 
mately the same number of children with defects, due to the operation 
of chance, and even if they do not, the inequality would have to be 
considerable to influence appreciably the mean achievement of the 
group. 

7. Both Thorndike** and Starch** have concluded on the basis of 
the findings of several investigations reviewed by them that sex is 
a very minor factor in learning. The conclusions of Minnick*’ and 
Touton®® are in agreement with those of Thorndike and Starch, but 
a recent investigation by Webb shows that when boys and girls of 
the same general intelligence are compared with respect to achieve- 
ment in geometry, the boys exceed the girls on the lower mental levels 
but are exceeded by them on the higher.*® He states “that those 
studies of sex differences, which neglect to take into account the 
factor of mental ability, fail to discover significant differences between 
sex groups which may exist at one mental age level, but not at 
another.’’*° 

Fitzgerald and Ludeman have reported the results of an investi- 
gation in which it was found that in the sixth and seventh grades 
boys achieved more in history than girls, but in the eighth grade the 
greater achievement was shown by the girls.4t Van Wagenen found 
sex differences in learning American history to be great enough to 
warrant establishing two sets of norms for his “American History 
Scales.”*? Fisher discovered that a loss of efficiency in mechanical 
learning takes place a year earlier in girls than in boys.** 

From these more recent studies the conclusion may be drawn 
that sex is a factor of less importance than those described in the 
preceding pages, but it should not be neglected by the educational 
experimenter who seeks highly dependable results. 


*Thorndike, E. L. Hducational Psychology, Vol. III. New York: Teachers College, 
Columbia University, 1914, p. 169-205. 
ee oe. Daniel. Educational Psychology. New York: The Macmillan Company, 1919, 
p. 68-72. 
Minnick, J. H. “A Comparative Study of the Mathematical Ability of Boys and Girls,” 
School Review, 23:73-84, February, 1915. 
‘Touton, F. C. “Sex Differences in Geometric Abilities,” Journal of Educational Psychology, 
15 :246-47, April, 1924. 
_ Webb, P. E. “A Study of Geometric Abilities Among Boys and Girls of Equal Mental 
Abilities,’’ Journal of Educational Research, 15:256-62, April, 1927. 
40Tbid., p. 262. A 
“Fitzgerald, J. A. and Ludeman, W. W. “Sex Differences in History Ability,” Pea 
Journal of Education, 6:175-81, November, 1928. , dd ean 
“Van Wagenen, M. J. “Historical Information and Judgment in Pupils of Elementary 
Schools,” Teachers College, Columbia University Contributions to Education, No. 101. New 
York: Bureau of Publications, Teachers College, Columbia University, 1919. 74 p. 
“Wisher, V. E. ‘A Few Notes on Age and Sex Differences in Mechanical Learning,” 
Journal of Educational Psychology, 18 :562-564, November, 1927. 
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8. The importance of race as a factor in learning is difficult to de- 
termine.** Various other factors, such as language handicap, social 
status, parental occupation, and other environmental influences, tend 
to obscure its significance. Although it may be impossible to ascribe 
differences between racial groups to something inherent in their re- 
spective races, it is none the less evident that actual differences exist 
in school achievement. The following are typical of the conclusions 
reached by investigators. , 


Statistical data carefully collected and presented in the foregoing study 
indicate rather conclusively that primary French-speaking children in certain 
Louisiana parishes are lower in achievement than English-speaking children and 
are seriously retarded.* 


The authors ascribe the lower achievement of the French-speaking 
children to the language handicap. 


The number of months by which the median educational age of the entire 
group of white children exceeds the median of the negro group was found to 
be 16.7 months. It was found further that only 14.5 per cent of the negro children 
reach or exceed the median educational age of the white children.“ 


A recent study of the retardation of seventeen hundred children of 
immigrants in two cities of northern Michigan shows that retardation 
according to nationality follows very closely the median intelligence 
quotients of the nationalities.*’ 

The conclusions may be expressed that whether there are inherent 
differences in race or not, characteristics distinctive of racial groups, 
such as language ability, social status, parental occupations, customs, 
prejudices, attitudes, and the like, are of enough significance in school 
work that race must be considered by the educational experimenter 
who desires to set up equivalent groups. 

Significant characteristics of pupil material not independent traits. 
Several of the characteristics of pupil material described in the pre- 
ceding pages are not independent. The correlation between general 
intelligence and previous achievement in any one school subject for a 
single school grade and general intelligence is likely to fall between 
11 and .69.48 The interdependence of general intelligence and general 
school achievement is expressed in the following statement from 


44Race may influence achievement indirectly through intelligence since races will differ 
in capacity to learn as they vary in intelligence. This aspect of the race factor does not 
concern us here since equating pupils with respect to intelligence will take care of it. We are 
interested in the more direct influences of racial characteristics on learning. ‘ 

45Brouillette, J. W., Foote, I. P., Robert, E. B., and Terrebonne, L. P. A Comparative 
Study of the School Progress of Foreign-Speaking and English-Speaking Children in the Early 
Elementary Grades. Chicago: Scott, Foresman and Company, 1928, p. 62-63. ; 

4ewWitty, P. A. and Decker, A. I. “A Comparative Study of the Educational Attainment 
of Negro and White Children,” Journal of Educational Psychology, 18 :498-99, October, 1927. 

4TBrown, G. L. ‘‘Intelligence as Related to Nationality,” Journal of Educational Research, 
5 :326, April, 1922. . ; ett ‘ ; : 

48Gates, A. I. ‘‘The Correlations of Achievement in School Subjects with Intelligence 
Tests and Other Variables,” Journal of Educational Psychology, 13:280, May, 1922. 
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Kelley: “On the average, in the neighborhood of .90 of the capacity 
measured by an all-round achievement battery score—reading, arith- 
metic, science, history, etc.—and of the capacity measured by 
a general intelligence test is one and the same.’*? General intelli- 
gence is also positively related to study habits and personality traits. 
Butterweck has shown that the brighter pupils in his investigation 
tended to employ the better study habits.°° Herriott, in the research 
already referred to, found a small though significant positive rela- 
tionship between general intelligence and the personality traits listed 
on page 23. In typical grade groups the correlation between chrono- 
logical age and mental age is negative. Brighter children tend to be 
accelerated, while duller children are retarded, so that in a giveu 
school class, the relatively brighter are the younger, and the relatively 
duller are the older. Terman reports a high negative correlation 
(— .74) between the I.Q. and the chronological age of a group of 
children entering high school as freshmen.®! Baldwin has shown that, 
in general, children who are gifted mentally are also superior physi- 
cally.°2 Hoefer and Hardy state that children whose physical con- 
dition is good have a more rapid mental growth than children whose 
physical condition is fair.®* Sex and race are not to be thought of 
as variables that may be correlated with intelligence. They are the 
least significant of the factors listed as characteristics of pupil ma- 
terial and are the most independent. 

The fact that several of the characteristics are positively corre- 
lated with general intelligence means that if two groups of pupils are 
equivalent with respect to mental age, or intelligence-test scores, under 
typical conditions, they are likely to approach equivalence with respect 
to previous achievement, study habits, personality traits, and physi- 
cal condition. 

Securing equivalent groups for a controlled experiment.** It is 
relatively easy to assemble two groups that are equivalent with refer- 
ence to a given characteristic, provided that characteristic can be 
measured accurately. For example, pupils may be paired on the basis 
of mental age, or intelligence-test scores, so that for each pupil in one 


Kelley, T. L. Interpretation of Educational Measurements. Yonkers-on-Hudson, New 
York: World Book Company, 1927, p. 21. my Ba 

Butterweck, J. S. ‘The How to Study Problem,” Journal of Educational R I 
18:66-76, June, 1928, : gah 

Terman, L. M. The Intelligence of School Children. Boston: Houghton Mifflin and 
Company, 1919, p. 82. 

Baldwin, B. T. ‘‘Anthropometric Measurements,’’ Genetic Studies of Genius, Vol. 1 
Stanford University, California: Stanford University Press, 1925, p. 135-71. J é ; 

Hoefer and Hardy, op. cit., p. 371-87. 

*4One group of pupils is equivalent to another with respect to a given characteristic when 
for each pupil in one group there is a mate in the second who possesses the same amount 
of the characteristic. An approach to equivalence is secured when the central tendency and 


vena of one group with respect to a given characteristic are equal to these measures of 
e other, 
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group there will be a mate in the second group having the same mental 
age or test score. Obviously, it would be difficult, if not impossible, 
under typical conditions to assemble two groups by locating pairs of 
pupils that are equivalent in respect to all significant characteristics. 
Hence, in assembling equivalent groups by pairing, the experimenter 
usually considers only one or at most two characteristics. When the 
groups have been assembled, they should be checked for equivalence 
with respect to the remaining significant characteristics. For example, 
if two groups have been assembled by pairing pupils having the same 
mental ages, or intelligence-test scores, the mean and standard devia- 
tion of each group should be calculated for chronological age, and 
previous achievement, when it is significant. If the mean and stand- 
ard deviation of one group are not approximately equal to those of 
the other group, adjustments should be made to secure approximate 
equality or the pair of groups rejected for experimentation. If ade- 
quate measuring instruments were available, it would be desirable also 
to check the equivalence of the groups with respect to study habits 
and personality traits in the same way. The equivalence of the groups 
with respect to sex and race should be checked to make certain the 
groups exhibit no marked differences with respect to these character- 
istics. The experimenter should also make certain that the two groups 
involve no serious differences in physical condition. 

A technique seems to be evolving for selecting pairs of pupils on 
the basis of a composite measure of characteristics. According to one 
technique, if it is desired to pair children on the basis of two charac- 
teristics, such as intelligence and previous achievement, a correlation 
chart of the test scores for intelligence and achievement may be con- 
structed. The position of each child with respect to both of these 
characteristics is shown by a single dot on the chart. The experi- 
menter selects his pairs by locating dots which are closest together. 
An illustration of the use of this technique is to be found in the report 
of an experiment by Butterweck.®® Another technique is that of com- 
bining measures of different characteristics into one composite measure 
representing all of them. The children are then paired on the basis 
of the composite scores. The use of this technique is illustrated in the 
experiment of Douglass on the relative effectiveness of two sequences 
in supervised study.°* Before this technique can be commended as the 


55 terweck, J. S. ‘‘The Problem of Teaching High-School Pupils How to Study,” 
ee College, Columbia University Contributions to Education, No. 237. New York: 
Bureau of Publications, Teachers College, Columbia University, 1926 Ome 

S6Douglass, H. R. ‘‘The Experimental Comparison of the Relative Effectiveness of Two 
Sequences in Supervised Study,” University of Oregon Publication, Education Series, Vol. 1, 
No. 4. Eugene, Oregon: University of Oregon, 1927, p. 173-218. 
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best technique to use, research must determine the weights to be given 
each characteristic in the composite score. 

Melby and Lien” have reported on a technique for controlling pupil 
factors which does not involve the use of pairing procedures. The 
initial status of three or more available groups is determined with 
respect to intelligence and previous achievement. After the order of 
superiority of the groups has been determined from comparison of 
their initial status, the experimenter selects one of the average groups 
as the experimental group. The assumption is made that if the medi- 
ocre experimental group exceeds in final achievement the initially su- 
perior group, then superiority of the experimental factor is dependably 
indicated. If, however, the final achievement of the mediocre experi- 
mental group falls below that of the initially inferior group, then the 
inferiority of the experimental factor is shown. The technique is com- 
mendable in that it permits the use of ordinary school classes without 
modification. It cannot be regarded, however, as anything more than 
a “practicable technique,” as it is labeled by Melby and Lien. The 
technique lacks precision in that the difference in gains in achievement 
is not ascribable to educative factors alone, as is the case when equiv- 
alent groups are used. Since it lacks this precision, it is difficult to see 
how clear-cut conclusions may be drawn from an experiment in which 
it is used. 

The educative factors that affect pupil achievement. The educative 
factors that affect pupil achievement are grouped here under the 
following heads: 

I. Teacher factors 
II. General school factors 
III. Extra-school factors 

I. Teacher factors that affect pupil achievement. Amount of train- 
ing, teaching experience, intellectual status, personality, physical con- 
dition, sex, and age are usually listed as important teacher factors, 
but they influence pupil achievement for the most part indirectly 
through their contributions to more immediate factors. For example, 
training, amount of teaching experience, and intellectual status con- 
tribute to the teacher’s instructional techniques and to his skill in 
the use of them; hence, these factors influence pupil achievement in- 
directly. Our problem here is to determine the teacher factors that 
influence pupil achievement directly. 


sTMelby, EH. O. and Lien, Agnes. “A Practicable Technique for Determining the Relativ 
atin ree of Different Methods of Teaching,” Journal of Educational Research, 19 “255-264, 
pril, 0. ; : 
_, Credit is given to Professor John G. Rockwell of the University of Minnesota for de- 
vising this technique and using it in an experiment on thyroid deficiency. 
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In The Commonwealth Teacher-Training Study, Charters and 
Waples determined a list of twenty-five teacher traits® by interview- 
ing a number of persons considered competent and by listing the “trait 
names” and “trait actions” mentioned by these persons as character- 
istics of good teachers. In view of the comprehensiveness of this 
study, it might be argued that the twenty-five traits listed, or at least 
those of highest rank, should be taken as the teacher factors that 
affect pupil achievement and, hence, as the teacher factors that should 
be controlled in an experiment. It does not appear satisfactory to do 
this. The list is too long and does not include instructional techniques 
or classroom-management procedures. Consequently, the present 
writers propose the following list of teacher factors for consideration. 
Evidence of the potency of each of these factors is presented as a 
basis for a conclusion in regard to the ones that must be controlled in 
experimentation in order to avoid introducing a serious error in the 
results. 

1. Instructional techniques 
a. Learning exercises 
b. Motivation procedures 
c. Directive procedures 
d. Diagnostic procedures 
Classroom-management procedures 
Skill in carrying out instructional techniques and classroom- 
management procedures 
Zeal of the teacher with reference to experimental factor 
Personality traits 
Physical condition 
Sex 
8. Age 

1. The more influential instructional techniques may be classified 
under four heads: (a) learning exercises; (b) motivation procedures; 
(c) directive procedures; (d) diagnostic procedures. The attention 
given to methods courses in the professional training of teachers is 
evidence of the conviction that the instructional techniques employed 
by a teacher affect the achievements of his pupils. Hence, it is not 
necessary to present evidence in justification of them as important 
teacher factors.®® It should be noted, however, that the influence of 
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58Charters, W. W. and Waples, Douglas. The Commonwealth Teacher-Training Study. 

i ; iversity of Chicago Press, 1929, p. 18. ; : ; 
or a aa ae evidence = afforded by investigations of the relation between achievement 
in the field of methods of teaching and teaching ability. | In one such investigation the partial 
correlation (between ‘‘ability to pass a professional test and ‘‘general teaching ability’”’) was 


eS one. it aie “Qualities Related to Success in Teaching,” Teachers College, Columbia 


University Contributions to Education, No. 120. New York: Bureau of Publications, Teachers 
College, Columbia University, 1922, p. 42. 
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a given technique depends on its appropriateness. In order to be 
most effective, a given technique must be suited to the pupils, com- 
patible with the objectives to be attained, and supplemented by other 
techniques. For example, a learning exercise suitable for pupils on 
the lower levels of intelligence is not likely to be a good one for bright 
pupils. Certain types of drill exercises in arithmetic have been demon- 
strated to be effective relative to the attainment of certain objectives, 
but they are not effective when other objectives are to be attained. A 
“good” learning exercise is likely to be relatively ineffective unless 
it is supplemented by appropriate directive and diagnostic procedures. 
The rule that practice should be distributed rather than concentrated 
is further evidence that the influence of a given instructional tech- 
nique depends upon factors other than its intrinsic character. 

2. Classroom-management procedures include such items as taking 
the roll, distributing and collecting materials, starting the work of 
the period, and dismissing the class in case the pupils go to another 
room at the end of the period, and dealing with disciplinary cases. 
The importance of these procedures is generally recognized. In fact, 
until recent years the teacher’s ability as a disciplinarian was con- 
sidered to be the most important of his qualifications. While other 
aspects of teaching are now considered of more importance than the 
mere maintenance of order, adequate attention to routine matters of 
classroom management, inclusive of discipline, is regarded as essen- 
tial for the promotion of the most suitable environment for learning. 
If, however, distinctly undesirable practices are avoided, it appears 
likely that variations in classroom-management procedures will not 
affect pupil achievement to a significant extent. 

3. The effectiveness of an instructional technique or a classroom- 
management procedure depends upon the skill with which it is carried 
out. This factor was implied in the discussion of instructional tech- 
niques, but its importance justifies more specific consideration. 
Although we have no means of securing precise measures of teaching 
skill, it is obvious that some teachers are more skillful in carrying 
out certain instructional techniques than are other teachers. When 
a new technique is being compared with a familiar one, it is likely 
that the new one will be applied less skillfully. For example, sup- 
pose an experiment is devised to determine the effect of supervised 
study in comparison with study without supervision. Suppose further 
that the plan of supervising study has been worked out so that the 
procedure is specified in detail. If a teacher, who has become a skill- 
ful instructor under a plan that does not involve supervised study, 
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but who has not had experience in supervising study, attempts to 
teach one class employing supervised study and another without 
supervised study, it is reasonable to expect that he will be consider- 
ably more skillful in teaching the second class. If this is the case, 
the experiment would furnish a comparison between skillful teaching 
without supervised study and teaching with supervised study some- 
what crudely carried out. Hence, the experiment would not yield 
satisfactory evidence of the relative merits of skillful teaching with 
supervised study and skillful teaching without supervised study. 

An illustration of the recognition of the importance of skill as an 
educative factor is afforded by the Newark Phonics Experiment. The 
teachers of the experimental classes, the principals of the schools in- 
volved, and the members of the Experimental Committee met to- 
gether and formulated a detailed working plan. Then the plan was 
tried out for a semester before the real experiment was begun.®° 

4. The zeal that a teacher exhibits in carrying out the instruc- 
tional techniques he is employing is a subtle factor. It is closely 
related to the factor of skill, and perhaps the two overlap to some 
extent, but there is evidence that indicates the presence of an important 
educative factor that differs in some respects from skill. It is reason- 
able to expect that a teacher will exhibit greater zeal when employ- 
ing a method that he believes in than when employing one that he 
does not like. The influence upon pupil achievement of the teacher’s 
preference in regard to methods is indicated in an unpublished re- 
port®! of an experiment to determine the relative merits of instruc- 
tional procedures that may be designated as Method A and Method 
B. Several teachers cooperated in the experiment, each one teaching 
one class according to Method A and another class according to 
Method B. The following results were secured. 


Mean 
Mean Scholastic 
Number Score Grade 
Bupilsapaughteby Method (AG .j.5 +. jl ce cle ++i 417 (NES 83.9 
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The teachers were asked to indicate which method they preferred. 
The following results were obtained when the data were tabulated 
according to the preference of the teachers. 


Sexton, E. K. and Herron, J. S. “The Newark Phonics Experiment,’’ Elementary School 
:690-701, May, 1928. . : hoe 
sel rite sat ih are yilebied to Dr. Rosalie M. Parr, of the University of Illinois, for these 
data. A report of this study is to be published in the Journal of Chemical Education. 
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Gain in favor of Method B 


The experiment was carried on during the latter part of the semes- 
ter and the scholastic grades are probably a fair index of the equiv- 
alence of the two groups of pupils. According to this criterion, the 
paired groups are approximately equivalent except in the case of those 
taught by teachers preferring Method’ A. The difference for this pair 
of groups, however, is small in comparison with the difference between 
the mean scores. Furthermore, it may be that the scholastic grades 
of these pupils were influenced by their performances during the ex- 
periment.®°? The differences between the mean scores of the several 
pairs of groups strongly suggest that the preference of the teachers 
in regard to the method of teaching affected the achievements of the 
pupils. If it is assumed that the preference in regard to methods 
affected the zeal of the teachers, it follows that this characteristic of 
teaching is an important educative factor and, hence, must be con- 
trolled in order to secure dependable results. 

Douglass®* has reported data that may appear to be in opposition 
to the conclusion just stated. At the close of an experiment to de- 
termine the relative effectiveness of two sequences of supervised study, 


©The papers of the test given at the close of the experiment were scored by a central 
committee, and, consequently, the teachers did not know the results until after the grades had 
been assigned. But it is likely that the teachers had some idea of the achievements of the 
pupils during the experiment, and since the pupils taught by Method A by teachers pre- 
ferrmg this method did much better work than the pupils taught by Method B by the 
same teachers, it is not unlikely that the difference in mean scholastic grades is due in part 
to this fact. If this hypothesis is correct, these two groups were more nearly equivalent than 
the mean semester grades indicate. 

8Douglass, op. cit., p. 173-218. 
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all instructors involved in the investigation were asked to express an 
opinion in regard to the relative merit of the two instructional pro- 
cedures. Nine of the fourteen opinions expressed were contrary to the 
experimental results in the pair of classes taught by the teacher giving 
the opinion. The conditions of this investigation differ in certain sig- 
nificant respects from the one described in the preceding paragraphs. 
In the first place, the Douglass experiment was carried on in the Uni- 
versity High School at the University of Oregon. The other experi- 
ment was cooperative and involved about as many different schools 
as there were teachers. Another difference is that Douglass asked his 
teachers to express an opinion in regard to the results of the experi- 
ment, whereas in the other experiment the teachers were asked to indi- 
cate the method they preferred. Finally, the teachers in a University 
High School are likely to be more scientifically-minded than teachers 
in typical high schools and, hence, would be less likely to have strong 
preferences and more likely to be equally zealous in carrying out both 
of the methods. 

The statement is made in the report of the experiment by Melby 
and Lien that, “The teacher, in fact, was secretly hoping that the 
results would reflect credit on the experimental method .... Yet 
results favored the control groups.’°* This should not be interpreted 
to mean that the data of this experiment are such as to minimize the 
importance of zeal as a factor. It is evident from the description of 
the procedures employed in the instruction of the control pupils that 
considerable zeal was exercised in spite of the teacher’s dislike for 
these procedures. The inference that may be drawn from the report 
of this experiment is that this teacher was also sufficiently scien- 
tifically-minded to control the factor or zeal. It, therefore, appears 
that the Melby and Lien experiment and the one by Douglass are not 
necessarily in opposition to the one previously described. This con- 
clusion is supported by evidence from other investigations. 

The influence of some teacher factor, which probably was zeal, is 
revealed in the Newark Phonics Experiment. “ ... . the results show 
conclusively that there is immeasurably less difference between classes 
taught with and without phonics than between different schools. 
Where the results were unusually good in a class taught by a teacher 
using phonies, they were unusually good when the same teacher taught 
without phonics. On the other hand, poor results were secured in both 
phonic and non-phonic groups taught by the same teacher.”® 


Melby and Lien, op. cit., p. 264. 
®Sexton and Herron, op. cit., p. 701. 
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In the experiment by Collings®® the children taught by the project 
method achieved more than those taught by the traditional method, 
but it appears from Collings’ report that these teachers worked much 
harder at their task than did the teachers in the control schools. In 
view of this fact, it does not appear justifiable to ascribe the superior 
achievement of the project-method group entirely to the method of 
instruction. The unusual zeal of the teachers undoubtedly contributed 
a large portion of the superiority in achievement. This conclusion is 
supported by an investigation reported by Gates.*’ The account of 
the experiment indicates that the teachers employing the “modern 
systematic method” exhibited as much zeal as those employing the 
“opportunistic method.” The results favor the former. Although this 
experiment differs in several respects from the one conducted by Col- 
lings, they are sufficiently alike to justify the conclusion that the zeal 
of the teacher is a potent educative factor. 

More direct evidence of the effect of a high degree of zeal upon 
achievement is furnished by an investigation by Pittman.®* The de- 
scription of the activities of the teachers in the experimental group of 
schools makes it apparent that they exhibited a very high degree of 
zeal. For example, it is stated: “The teachers under professional 
supervision did approximately four times as much professional reading 
as they themselves had done during the previous year, or as the un- 
supervised group, with which they were compared, did during the year 
of the experiment.’*® The description of this experiment would not be 
seriously distorted if the zeal of the teachers was designated as the 
experimental factor. Hence, the distinct superiority in achievement 
of the pupils in the experimental schools was undoubtedly due in a 
large measure, either directly or indirectly, to the zeal of the teachers. 

After a relatively elaborate and careful study of the factors related 
to teaching success, Knight concluded that “ .... general factor of 
interest in one’s work becomes the dominant factor in determining 
one’s success in teaching . . . . it is reasonable to suppose that genuine 
interest in one’s work accounts for a large part of teaching success.”7° 

5. The term personality traits is used here as a name for a complex 
of subtle teacher factors that are commonly designated by such terms 
as “general appearance,” “voice,” “self-control,” “tact,” “sympathy,” 
“sense of justice,” and “loyalty.” Such traits have not been defined 


_ SCollings, Ellsworth. An Experiment with a Project Curriculum. New York: The Mac- 
millan Company, 1923. 346 p. 
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so that satisfactory measurement is possible, and, consequently, we 
do not have any definite measure of their effect upon pupil achieve- 
ment. There is, however, a wide-spread conviction™ that the “person- 
ality”’ of the teacher is an important educative factor. This con- 
viction is supported by some evidence. Morris has reported a partial 
coefficient of correlation of .463 between success in practice teaching 
and “trait index.’”"* Hence, it seems safe to conclude that “person- 
ality traits” do affect pupil achievement to such an extent that they 
cannot be safely neglected in an experiment. 

6, 7, and 8. The teacher’s physical condition, sex, and age have 
an indirect influence on school achievement in so far as they condition 
the zeal with which a teacher employs instructional procedures and 
the skill with which he uses them. The teacher’s physical condition, 
sex, and age may influence directly the achievement of children by 
engendering attitudes that may be beneficial, or detrimental, to learn- 
ing. The relation of the teacher’s physical condition to teaching effi- 
ciency is indicated in studies made of teacher failure. In a study by 
Buellesfield poor health takes twelfth rank as a chief cause and twen- 
tieth rank as a contributory cause.’* Littler ranks poor health lowest 
of seven causes of teacher failure.** Moses places it eleventh in point 
of frequency; there are no less frequent causes mentioned.’® In a 
recent study of teacher failure Madsen reports physical condition as 
the cause in only two out of thirty-one cases, in one case deafness 
and in the other case general physical disability.” 

Some correlation coefficients have been determined in an effort to 
indicate the relation of health or physical condition to teaching effi- 
ciency. Bradley states that the correlation between “general merit” 
and physical efficiency is .59, the lowest of several listed by him.” 

1The Commonwealth Teacher-Training Study referred to on page 31 affords conclusive 
evidence of this statement. For senior high-school teachers the ten traits considered most 
important are: breadth of interest, self-control, good judgment, leadership, scholarship (in- 
tellectual curiosity), forcefulness, honesty, adaptability, enthusiasm, and open-mindedness. ; 

Although this term has not been defined with precision, as it is commonly used, it 
undoubtedly includes zeal (as the term has been used in the preceding pages) and probably 


overlaps with skill. Hence, the term ‘‘personality traits,” as it is used here, is not synonymous 
with “personality.” : * 

@®Morris, E. H. ‘Personal Traits and Success in Teaching,” Teachers College, Columbia 
University Contributions to Education, No. 342. New York: Bureau of Publications, Teachers 
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Boyce gives a smaller correlation coefficient, .18, between health and 
general merit.”® Ruediger and Strayer report the correlation to be .04 
between general merit and health.8° The recent study of Whitney 
gives a coefficient of .124 between physique and teaching success after 
graduation.’ However, it should be stated that this is a greater rela- 
tionship than was found for intelligence and success after graduation. 
Whitney places physique as the fourth most important item in the pre- 
diction of success in teaching. It follows student teaching, professional 
marks, and academic marks. The weight of the evidence is in favor 
of regarding the physical condition of the teacher, so long as extremes 
are avoided, as a minor factor in the achievement of the pupils. 

There is no evidence to be found in the literature which would in- 
dicate that the teacher’s sex is an important factor in the learning 
activity of the pupils. It is said that the pre-adolescent boy prefers 
men teachers to women teachers, but it is yet to be proven that this 
prejudice, even assuming it to be universally existent, is sufficient to 
decrease his achievement significantly. 

The age of the teacher is not usually a significant factor in suc- 
cessful teaching. After stating that the correlation of teaching skill 
with age was negligible for a group of Massachusetts teachers, Knight 
goes on to say: 


We know there is some correlation between age in general and teaching 
ability. A five-year-old child could not teach, and excessive old age would no 
doubt be negatively correlated. But within those age limits during which men 
and women ordinarily teach, age does not appear to be correlated with teaching 
skill. The younger teachers are not the best as a current superstition would 
lead us to think; nor do years of tenure make material additions to skill.” 


This estimate of the importance of age as a factor in teaching is 
concurred in by Whitney who states, “Age is not a particularly im- 
portant element in good teaching.’** It seems safe to assume, there- 
fore, that so long as extremes are avoided, the teacher’s age is not a 
significant factor in the learning activity of the pupils. 

The control of teacher factors. The preceding consideration of 
teacher factors has demonstrated the necessity of controlling, i.e. 
keeping the same, at least four teacher factors: (1) instructional pro- 
cedures, (2) skill in carrying out these procedures, (3) zeal of the 
teacher, and (4) personality traits. Control of instructional tech- 
niques may be approached by giving the teachers participating in the 


®Boyce, A. C. “Qualities of Merit in Secondary School Teachers,’ Journal Educati 
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experiment detailed instructions with regard to the conduct of their 
classes during the experiment. An example of such an attempt to con- 
trol this factor is illustrated in the following quotation from the re- 
port of an experiment by Coryell: 


In order to secure uniformity of procedure and the consistent carrying out 
of the prescribed methods, the teachers who were to collaborate in the experi- 
ment met from time to time in conference for devising ways and means. The 
class work for the first week was planned in the minutest detail. The same 
questions were actually used by all three teachers and the lesson plans were 
followed as exactly as was humanly possible while conducting a live recitation. 
Many other plans were made and used in common, and where no detailed lesson 
plan was drafted for all three teachers to use, the work to be covered each day 
was broadly outlined.™* 


From one point of view, it is not sufficient to secure equivalence 
of instructional techniques. They should be representative of sound 
educational practice. This requires, among other things, that there 
be adaptation of techniques to the needs and purposes of the pupils 
as they are revealed in the course of the instruction. Hence, if control 
of instructional techniques is carried too far, the requirement of sound 
educational practice may be violated. 

The control of skill and of zeal is much more difficult. A speci- 
fied degree of skill or of zeal cannot be secured by asking teachers 
to follow certain instructions. Any direct request to be more skillful, 
and especially to be more zealous, may produce the opposite result. 
By exercising care in selecting teachers and by making use of in- 
direct devices, a skillful experimenter may secure approximate equiva- 
lence of these two teacher factors, but, since neither can be measured 
objectively, he cannot be certain that he has done so. 

A method frequently employed to control these factors is the 
rotation of teachers at the mid-point of the experimental period. The 
teacher who has been teaching the experimental group exchanges with 
one who has been teaching the control group.®® Thus, any difference 
in skill or zeal on the part of two teachers is expected to be corrected 
by the fact that both the experimental and control pupils have re- 
ceived an equal amount of stimulation from both teachers. The ex- 
periment of Douglass is an example of the use of this technique.*® 
This procedure will be successful in securing equivalence of these fac- 
tors when each teacher is equally skillful in carrying out both forms of 
the experimental factor and is equally zealous in doing so. A teacher 
might teach with equal skill and zeal in employing two different tech- 
niques, but it appears likely that most teachers, because of a lack of 


oe ll, N. G. “An Evaluation of Extensive and Intensive Teaching of Literature,” 
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familiarity with, or a dislike for, one of the procedures, might teach 
with less skill and zeal in one of the groups than in the other. When 
this occurs, the rotation procedure will not succeed in securing con- 
trol of skill and zeal except by chance. 

Another plan for securing control of these factors is to have the 
same teacher teach an experimental and a control group. The success 
of this method will depend on the degree to which the teacher carries 
out both the experimental and control instructional procedures with 
equal skill and zeal. In order to approach control when a single 
teacher is used, or when two teachers exchange groups at the mid- 
point of the experiment, it might be suggested that teachers be prac- 
ticed in the experimental procedures before the start of the experi- 
ment, and enough of the scientific attitude be engendered in them 
to overcome the operation of prejudice for any particular method. 
When a teacher develops a preference for one of the procedures, he 
becomes disqualified for the experiment. 

Another difficulty is introduced when we recognize the require- 
ment that the teaching represent sound educational practice. This 
requirement involves the provision that the teacher believe in the 
method he is employing rather than be indifferent or even open- 
minded toward it. In fact, sound educational practice probably re- 
quires that the teacher be prejudiced in favor of the method he is 
employing. If this is admitted, it is apparent that an experimenter 
should not expect to secure equivalence of skill and zeal by the rota- 
tion method or by having a teacher teach both an experimental group 
and a control group. 

Since “personality traits” can not be measured satisfactorily, it is 
impossible to determine accurately the status of teachers with refer- 
ence to this factor. Marked differences probably can be discovered, 
but in general it is not possible with our present techniques to select 
teachers who are equivalent with reference to “personality traits.” 
Control may be secured by the rotation method or by having the 
same teacher instruct both a control group and an experimental group. 
Although these procedures will secure control of “personality traits,” 
they are not completely satisfactory for reasons pointed out in the 
preceding paragraphs. 

II. General school factors that affect pupil achievement. Pupil 
achievement is affected directly or indirectly by several general school 
factors. For example, it is generally assumed that the textbook used 
in a course influences the achievement of the pupils. Much of this 
influence is indirect. The character of the text influences the learn- 
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ing exercises assigned which in turn influence achievement. In the 
following list of general school factors no attempt is made to indicate 
whether a factor functions directly or indirectly. 


1. Instructional materials (textbooks, library, maps, labora- 
tory apparatus, etc.) 

. Time devoted to learning activity 

. Characteristics of the class as a group 

. Size of class 

. Size of school 

. School organization 

. Administration and supervision 

. School building, especially lighting, heating, and ventilation 


COND Ore W bO 


1. Instructional materials, such as textbooks, libraries, and other 
school equipment, influence the learning activity of pupils through 
the learning exercises that they furnish or make possible. Texts in 
arithmetic, algebra, language, physics, and most of the other sub- 
jects furnish a number of learning exercises. Texts and other books 
make possible other learning exercises, such as requests to study 
certain pages or questions whose answers may be found by reading. 
In a similar way charts, maps, moving picture machines, laboratory 
apparatus, and the like affect the number and type of learning exer- 
cises that may be assigned. Hence, the achievement of the pupils is 
likely to be affected by the instructional materials used with a class. 

The intimate relation between instructional materials and learning 
exercises may make it impossible to have the former constant when 
the latter are greatly different. It should be noted, however, that cer- 
tain types of learning exercises require certain instructional materials. 
Hence, if the purpose of an experiment is to compare two types of 
learning exercises, such as the lecture-demonstration method of teach- 
ing a science and individual-laboratory work, the materials must differ. 
In such cases, the difference in instructional materials is essentially a 
phase of the experimental factor. 

2. If the time devoted to learning activity is assumed to be an 
approximate index of the amount of exercise of modifiable connections, 
it is apparently an important educative factor. In considering its im- 
portance in experimental investigations two cases should be noted: 
(1) the long-time experiment in which the difference in time spent is 
due to absences of certain pupils, and (2) the experiment in which the 
difference in time spent is a difference in the length of the class period 
or in the total time devoted to study. 
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In a study reported by Odell, a slight positive correlation was 
found between average school marks and per cent of time in attend- 
ance.s* The relationship of length of attendance to educational age is 
shown in the coefficient of correlation of .0 + 2 reported by Den- 
worth.’8 In view of these relatively low correlations it would seem 
plausible to say that if there were no extreme cases of inattendance, 
or irregular attendance, and if the absences were approximately bal- 
anced, attendance would be an insignificant factor. 

In the second case, it seems reasonable to expect that the time 
spent in learning activity is an important educative factor. Experi- 
mentation on the distribution of practice in learning has shown that 
there are optimum lengths of practice periods for different types of 
learning. Pyle®® in his substitution experiment used fifteen-, thirty-, 
forty-five-, and sixty-minute practice periods. His results were in 
favor of the thirty-minute period. The experiments of Hahn and 
Thorndike,®® Kirby,®? Starch,®? and Lyon® indicate that the length of 
the practice period is a factor in learning. The evidence just cited 
tends to show that more time spent in learning activity does not nec- 
essarily imply more learning. Up to a certain point, increase of the 
learning period may be beneficial to learning; beyond this point, in- 
crease may be detrimental. It is probable that this sort of thing 
operated in the investigations of Rice,°* Heck,® Jones and Ruch,®* and 
Barnes and Douglass®*’ who found little relation between time spent 
in learning activity and achievement. It should be noted that “time 
spent in learning activity” needs definition in this connection. Think- 
ing and talking about an assignment probably should be included as 
well as formal study, either at school or at home. If this thesis is 
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accepted, the control of the time spent in learning activity frequently 
will be difficult. The evidence cited from the experimentation on dis- 
tribution of practice is sufficient to warrant the assertion that the 
experimenter should use whatever means are available to secure, so 
far as possible, an equal amount of time each day to be spent in learn- 
ing activity, both in recitation and study, by the experimental and 
control pupils. 

3. The phrase characteristics of a class as a group is used to desig- 
nate a factor that is difficult to define. It includes what is commonly 
called esprit de corps. It does not include general intelligence and the 
other factors listed on pages 19 and 20 since the two groups are ex- 
pected to be equivalent in these respects. Rivalry among certain mem- 
bers of a class may stimulate the entire group to greater effort. Be- 
cause the pupils like each other or because of outside associations a 
teacher may prefer to instruct one class rather than another and, hence, 
may exhibit unusual zeal in his work. On the other hand, the members 
of a class may not like each other. There may even be petty jealousies 
and enmities that make the teacher’s task unusually difficult. The 
characteristics of the class as a group constitute a very intangible 
factor that operates in subtle ways. It is, however, of sufficient im- 
portance to warrant the consideration of the careful experimenter. 

4. The size of the class disappears as an educative factor in an 
experiment where equivalent groups are secured by pairing, since this 
procedure secures classes of equal size. If the two groups are not 
equal in size, small differences do not appear to be significant because 
within fairly wide limits size of class does not appear to be an im- 
portant educative factor.*® 

Incidentally, it may be noted that when generalizing from an ex- 
periment with classes of a given size, the conclusions may be expected 
to be applicable to classes of other sizes within a considerable range, 
provided the size of the class does not affect other educative factors. 

5. The size of the school indirectly affects the achievement of its 
pupils. Larger schools tend to possess superior organizations, better 
qualified administrative, supervisory, and instructional staffs, and a 
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ereater diversity of school equipment and, hence, probably provide a 
better environment for learning. Inferiority in these things has caused 
Rufi®® to question the efficiency of the small high school. However, 
in spite of the diversity between small and large schools in these 
things, it is yet to be proven that school size is anything more than a 
minor factor in the achievement of the pupils. Gowen and Gooch’? 
compared the average college grades of students from large high 
schools with those of students from small high schools and failed to 
find a significant difference. Size of school, in itself, does not seem 
to be anything but a very minor factor in learning activity. As long 
as small size does not mean different organization, less qualified ad- 
ministrators and teachers, or lack of the materials of instruction pre- 
scribed in the experiment, it may be neglected by the experimenter 
even when the two groups compared are in different schools. 

6. It seems reasonable that the organization of the school is im- 
portant enough to be considered when experimental and control groups 
are in different schools. For example, schools in which there is in- 
dividual instruction, ability grouping, or a platoon system are not 
appropriate environments for experimental groups unless the control 
groups are subject to the same conditions. To illustrate the im- 
portance of school organization, the conclusion of an investigation 
in which the achievement of equivalent rural- and city-school children 
was compared may be given: 


The results of this study indicate that the progress of graded-school pupils 
was approximately one-half school year in advance of that of the pupils with 
whom they were paired from the rural schools.*™ 


Since the pupils were equivalent in intelligence and chronological 
age, the difference in achievement must be ascribed, at least in part, 
to the superior organization of the graded city schools. 

7. The administration and supervision of a school must be an 
important factor in learning activity if the attention given to these 
fields in teacher-training institutions is any criterion. However, it 
is difficult, if not impossible, to find any quantitative evidence in 
regard to the contribution of this factor to classroom learning. The 
reason for this seems to lie in the fact that any influence exerted by 
administration or supervision must be an indirect one operating 
through the teacher, the course of study, the organization of classes, 
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the provision of school equipment, and so on. It seems logical to 
assume that the educational experimenter who has controlled these 
more direct factors will have taken care of administration and super- 
vision. 

8. The school building is a factor in school achievement in that it 
provides an environment that may be beneficial or detrimental to 
learning. While there is no experimental evidence, it seems logical 
to assume that learning takes place moré readily in the beautiful and 
appropriate school buildings now in existence than in the ugly and 
inconvenient structures of a generation ago. The importance of light- 
ing 1s recognized in the weight given to it in building score cards.1°2 
While the chemical composition of the air is usually so constant as 
to be unimportant, its temperature, humidity, and movement spell 
comfort or discomfort to pupils in a classroom and through this in- 
fluence achievement.?°? Two investigations in which Thorndike has 
participated minimize these factors of ventilation.1°* However, lest 
the inference be made that ventilation is a wholly unimportant factor 


because of these findings, Sandiford has made the following comment: 


Apparently we can, if we will, work as hard under adverse conditions of 
heat and humidity as under favorable ones. Even summer school in New York 
or Timbuctoo need not daunt us! What should be noted, however, is that these 
distressing conditions are uncomfortable, and if we subject children to them 
the likelihood is that their attention will be distracted from work.” 


It may be stated that light, heat, and ventilation become important 
factors to the educational experimenter only when grossly abnormal 
conditions prevail. When such conditions are avoided, they probably 
are not factors of sufficient importance to warrant the attention of the 
experimenter. 

The control of general school factors. When the two groups of 
pupils are within the same school, the most significant general schoo! 
factors appear to be (1) instructional materials, and (2) time de- 
voted to learning activity. The control of instructional materials as 
a non-experimental factor is accomplished by securing an identity 
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of instructional materials for both the experimental and control groups. 
Reeder! secured such identity by using the same textbook in both 
eroups and permitting access to the textbook only during the class 
period. It goes without saying that pupils should not only have the 
same opportunities with respect to a textbook, but should have equal 
access to supplementary material, such as reference books, maps, 
charts, and museums, as well. This statement, of course, applies only 
to those instructional materials that are not involved in the experi- 
mental factor. 

Securing equivalence of time spent in learning activity demands 
that the length of the class period and the number of periods spent 
on the experimental learning be the same for both groups. It necessi- 
tates also that the experimental and control pupils spend an equal 
amount of time in study whether in school or at home. This is 
probably best accomplished by having all the pupils study the ex- 
perimental learning for the same length of time under the super- 
vision of the study-hall director, or possibly their classroom teacher, 
or teachers. If the experimental and control pupils are to engage in 
the experimental learning at home, all of them should do so. The 
control of the time factor in this case may be approached by securing 
the cooperation of the parents of the children. Equivalence with re- 
spect to the time factor also demands that at the close of the experi- 
ment, the experimenter should check the amount of and regularity of 
attendance of the pupils in the experimental and the control groups. 
When the attendance of either pupil of a pair is grossly deficient or 
irregular, the pair probably should be discarded. 

The characteristics of the class as a group should not be neglected, 
but little can be done to control this factor. If equivalent groups are 
secured and the teacher factors are adequately controlled, it is likely 
that the two classes will not differ greatly in their characteristics as 
a group. For example, if the pupils are equivalent with respect to 
intelligence and previous achievement, both the experimental and 
control pupils will have the same advantages in profiting from the 
recitations of their fellows. If the teacher, or teachers, instruct the 
two groups with equal skill and zeal, similar group attitudes should 
be engendered. The experimenter, however, should endeavor to evalu- 
ate the two classes with respect to their group characteristics, and, 
if differences are apparent, the fact should be recognized in interpret- 
ing the results of the experiment. 

Reeder, E. H. “A Method of Directing Children’s Study of Geography,” 
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The size of the class is automatically controlled as a factor when 
classes are formed by pairing pupils. The presence of pupils not 
actually included in the experiment, as is sometimes the case when 
groups are selected without interfering with the composition of regular 
school classes, will not interfere with the experiment unless the num- 
ber of such pupils is large. Where a number of paired groups are 
used, it is probable that none of the pairs should differ greatly in size 
if the results are to be combined or compared. 

The size of the school does not influence one group more than the 
other, if both are within that school. When experimental and control 
groups are to be in different schools, the experimenter should select 
schools that are approximately the same size. In the control of this 
and other general school factors in cooperative experimentation where 
several schools participate, a measure of control is attained by having 
each experimental group paired with a control growp in the same 
school. The cooperative experiment of Breed in which fourteen schools 
cooperated is an example of this.1°7 


III. Extra-school factors that affect pupil achievement. Pupil 
achievement is affected by several factors that have not been included 
in the preceding lists. The following appear to deserve consideration: 

1. Participation in extra-curricular activities 
2. The pupil’s home life 
3. Community interest in and attitude toward the school 

1. Carefully supervised participation in extra-curricular activities 
tends to be beneficial rather than detrimental to learning.’’°* The 
pupil who engages in some such activity frequently becomes more 
interested in his regular school work. Dramatic, scientific, technical, 
literary, and debating clubs not only add interest to the school sub- 
jects to which they are related but they also may contribute directly 
to achievement. It is likely, however, that for each child, there is 
an optimum amount of participation above which his school achieve- 
ment will suffer. 

2. The child’s home life may influence his school achievement in 
many ways. Studying at home under parental supervision and with 
parental sympathy, listening to conversation of parents and other 
members of the family, reading periodicals and books that the home 
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affords, traveling with members of the family, and the like are activi- 
ties that sometimes make large contributions in the fields of school 
achievement. The following quotations indicate that the parents of 
children have been regarded as an important factor in school achieve- 
ment, particularly achievement that results from home study: 


Home environment is a factor in the formation of study habits. Its influence 
may be either for good or for bad..... 

Home study is desirable because it acts as a check on the formation of 
habits out of school that would be negative in their influence on habits in 
school.” 

The survey of the fourth, fifth, and sixth grades seems to justify this con- 
clusion: Where the parents are capable of guiding the child and are inclined 
to supervise the home study, their children succeed in school. But where parents 
are illiterate or for other reasons are unable or unwilling to supervise the home 
study, their children as a rule either make slow progress or are failures entirely 
when measured by the progress of their companions in school.” 


It is probable that the conclusion of Brooks exaggerates the 1m- 
portance of supervision of home study by parents since Heck’ has 
presented data to show that it is immaterial whether students study 
at home or at school. The inconclusiveness of the research on super- 
vised study would lead one to question the value of the type of super- 
vision most parents are capable of giving. It is possible that the at- 
titude parents take toward the school as an educative agency is a 
more potent influence than any supervision they may administer. For 
example, Hurlock’? has shown that praise of pupils by teachers is 
much more beneficial to achievement than reproof or indifference. It 
is probable that the same is true of praise or reproof on the part of 
parents relative to the school work of their children. Reavis'!* has 
described several interesting cases in which failure in school achieve- 
ment was due to detrimental parental attitudes whose correction, 
effected by enlisting the cooperation of the parents, resulted in the 
change from failure to success. 

Listening to conversation of parents and other members of the 
family, reading periodicals and books that the home affords, and 
traveling with members of the family are activities that may con- 
tribute to the experimental learning. These experiences help pro- 
vide the background of information for the learning that is to take 
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place during the experiment. Topics in history, civics, biology, litera- 
ture, and economics are more meaningful to the pupil who has had 
related experiences through conversation with members of his family, 
or through travel. It is impossible to estimate the extent to which 
school achievement is influenced by these out-of-school experiences. 

Several recent studies have minimized the importance of home 
environment with respect to school achievement. For example, Heil- 
man’s statement that “57% of the variation in educational age was 
due to mental age or such hereditary factors as had been measured; 


about 7% of the variation was due to the influences of school training 


and socio-economic status combined, or such environmental factors 
as had not been measured”"** would lead one to believe that home 
environment is not a significant factor. However, it is yet to be 
proven that the information acquired and the ideals and attitudes 
engendered in the home influence school achievement, as represented 
in the experimental learning, so little that the experimenter is justified 
in neglecting this factor. 

3. School achievement is influenced by community interest in and 
attitude toward the school. If the community is high in the socio- 
economic scale, the members of the community are likely to show 
much interest in school affairs and to cooperate with the principal 
and teachers in attaining the best conditions for school work. For 
example, the parents of such a community may cooperate with the 
school faculty in providing more adequate library facilities. In other 
cases the community may be permeated with attitudes antagonistic 
toward the school administration. Such attitudes among parents tend 
to be acquired by pupils. Thus, community attitudes and interest may 
exert a subtle though powerful influence on school learning. 

The control of extra-school factors. The participation in extra- 
curricular activities should be checked by means of information se- 
cured from teachers, school records, or the pupils themselves. The 
experimenter probably has controlled this factor satisfactorily when 
he has insured that there is no great excess of participating students 
in either group. This factor appears to be a minor one; therefore 
small differences in participation may be neglected. 

The relationships found to exist between the intelligence of chil- 
dren as measured by typical intelligence tests and parental occupation 
and social status would lead one to believe that pairing children on 
the basis of intelligence scores helps to secure equivalence with respect 


4Heilman, J. D. ‘Factors Determining Achievement and Grade Location,’ The Peda- 
gogical Seminary and Journal of Genetic Psychology, 36:453, September, 1929. 
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to the home life of the children used in the experiment.” Control 
of this factor is also aided by securing equivalence with respect to 
previous achievement. If the initial achievement tests are valid and 
reliable with respect to the experimental learning, and if the groups 
are equivalent with respect to the mean scores on the initial achieve- 
ment test, then it seems probable that the groups will be equivalent 
so far as influence from information obtained out of school is con- 
cerned. The inference should not be made that the child’s home life 
is considered an insignificant factor, or that securing of equivalence 
with respect to intelligence is all that needs to be done. The experi- © 
menter should be on the alert to detect cases in which abnormal home 
environment, particularly detrimental parental attitudes, are handi- 
capping the learning of individual children in the groups. 

The factor of community interest in and attitude toward the school 
will not usually demand attention if the experiment is confined to one 
school, or if there is a pair of groups in each one of several schools. 
If the experimental and control groups are in different schools, in 
different communities, this factor should receive attention. In such 
cases, however, it seems probable that pairing pupils on the basis of 
intelligence will do much to insure control of this factor, since it is 
likely that community interests and attitudes tend to vary with the 
intellectual level of the children. It is probably desirable that the 
presence of the experiment should not be given too much publicity, 
lest the parents and the other members of community take an un- 
welcome interest in the experimental or control procedures. It is desir- 
able, for the sake of generalization, that the communities in which 
experiments are conducted be typical of communities to which the 
results are to be applied. 


Summary with reference to control of educative factors. In the 
hight of the preceding discussion and of practical considerations, it 
appears that equivalence’® of at least the following educative fac- 
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by the experimenter that the factors operate equally in both the experimental and the control 
groups. 
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tors must be secured, or appropriate allowance for non-equivalence 
must be made in interpreting the difference in gains. 

1. Instructional techniques 

2. Skill in carrying out instructional techniques 

3. Zeal of the teacher 

4. Personality traits of teachers 

5. Instructional materials 

6. Time spent in learning activity- 

It should be noted that frequently the experimental factor is a 
phase of instructional techniques. When this is the case, the require- 
ment of equivalence applies only to the remaining phases. A similar 
comment applies to instructional materials. 

Control of instructional techniques and of instructional materials 
can be secured by careful planning and by giving attention to details 
during the experiment. When the learning is restricted to the class- 
room, the control of time spent and of the other environmental factors 
is easily secured, but when the learning involves home study, satis- 
factory control is more difficult. The greatest difficulty is in securing 
satisfactory control of skill, zeal, and personality traits of teachers. 

In addition to controlling the factors enumerated, the other edu- 
cative factors should be investigated to make certain that no marked 
differences exist. If there are significant differences, either the experi- 
ment should be organized so that they neutralize each other, or their 
possible influence must be estimated and corrected for in the interpre- 
tation of the data." 

The problem of controlling educative factors in different types of 
experimentation. 1. Single growp experiments. Since it is impossible 
to equate non-experimental factors in a single group experiment, con- 
trol of factors must depend on estimations of their influences on the 
experimental learning. The effect due to the application of the ex- 
perimental factor must be singled out from the effect due to all the 
other factors. This is usually done by comparing the achievement of 
the pupils under the influence of the experimental factor with their 
achievement prior to the application of the factor. It is obvious that 
this procedure will not often secure dependable quantitative results. 
In addition to the improvement that must be ascribed to the change 
that has occurred in the intellectual and educational status of the 
pupils, some of the improvement may be due to the less difficult in- 
structional material or to the greater zeal and effort shown by the 
teacher because of the novelty of the experimental method, or both. 


17Pactors not equated may be said to be controlled when their variation is determined 
and the effect recognized in the difference in gains. 
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The single group experiment is a desirable activity for the classroom 
teacher to engage in since it is likely to be stimulating, but in general 
it cannot be expected to result in dependable answers to educational 
problems. 

2. Experiments in which two equivalent groups are taught by the 
same teacher. The control of the non-experimental factors in experi- 
ments in which two equivalent groups are taught by the same teacher 
is dependent on the degree to which conditions are arranged so that 
these factors operate equally in both groups. The procedures to be 
used for securing such equivalence of factors have been suggested in 
the preceding pages. Since this type of experiment is conducted by a 
single teacher in one school, personality traits, sex, age, and physical 
condition of the teacher, size of school, school organization, admin- 
istration and supervision, school building, and community attitude 
and interest in the school do not need to receive the attention of the 
experimenter, since these are the same for both groups.™* It is evi- 
dent, however, that the other factors listed may be of unequal in- 
fluence on achievement unless conditions are arranged with care. 

After equivalent groups of pupils have been secured, control of 
the teacher factors of skill and zeal is very important. The teacher 
should be equally familiar with the instructional procedures and ma- 
terials used in the experimental group and with the instructional pro- 
cedures and materials used in the control group. The attitudes of 
the teacher with respect to these instructional procedures and ma- 
terials should be such that the teaching is done with equal zeal in both 
groups. In addition, the teacher must exercise constant care through- 
out the experiment in order to maintain an identity of classroom- 
management procedures and time spent in learning activity. The 
teacher must be able to adapt herself readily when teaching one group 
immediately after the other. 

The rotation technique is often employed to secure control of pupil 
factors when the groups are only approximately equivalent.1?® The 
instructional procedures and materials of the experimental and con- 
trol groups are exchanged at the mid-point of the experiment. In 
computing the results, the gain credited to the experimental factor 
is the sum of the gains of both groups while under its influence. The 
gain credited to the control procedures is the sum of the gains of 
significant: difference for the qiodpe condone: thee Uc Ca 
tions, these factors must be typical of the schools to which the generalizations are to be applied. 

19Tf Group A acts first as experimental and second as the control group, while Group B 
acts first as control and second as experimental group, two hypothetically equivalent groups are 


secured. Group A (as experimental) plus Group B (as experimental) is equivalent to Group A 
(as control) plus Group B (as control). In other words, the pupils are equivalent to themselves. 
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both groups while acting in the capacity of controls. The use of 
this technique, however, may introduce errors of more significance 
than those it would seek to eliminate. The group that first receives 
the benefit of the experimental factor is likely to acquire abilities, 
such as study habits, that will carry over and function when the 
group is acting as control. What is likely to happen is shown in 
the following illustrations in which the true gain is assumed to be 
8 when the experimental instructional procedures are used. The true 
gain is assumed to be 4 when the control procedures are used. The 
experimental instructional procedure is labeled Method X, and the 
control procedure, Method Y. Then for the hypothetical ideal situa- 
tion the gains are as follows: 
Gain 
GroupA 8 with Method X 
GroupB 4. with Method Y 
GroupB 8 withMethodX (after rotation) 
GroupA 4 withMethodY (after rotation) 
Difference = (8 + 8) — (4+ 4) =8in favor of Method X 
Assume that the carry over of study habits by Group A intro- 
duces an error of 3: 
Gain 
GroupA 8 with Method X 
GroupB 4. with Method Y 
GroupB 8 with Method X 
GroupA 7 (4+ 3) with Method Y plus study habits 
Difference = (8 + 8) — (4+ 7) =5 in favor of Method X 
The effect of this error plus the effects of others combining with 
it in unknown ways may be sufficient to destroy the significance of 
results, especially if the computed difference in gains happens to be 
small. If the teacher varies in skill or zeal, the use of the technique 
of rotating pupils is not likely to eliminate the errors created. Let 
us assume that the teacher prefers the experimental factor to the 
extent that the error introduced is equal to one-half the influence due 
to the experimental factor, or Method X, and at the same time let 
us assume that her dislike for the control procedures, or Method 
Y, is also sufficient to cause an error equal to one-half of the influence 
due to the Method Y. 
Gain 
GroupA 12 (8+4) with Method X plus preference 
GroupB 2 (4—2) with Method Y plus dislike 
GroupB 12 (8+4) with Method X plus preference 
GroupA 2 (4—2) with Method Y plus dislike 
Difference = (12 + 12) —(2-+ 2) = 20 in favor of Method X 


For the sake of comparison let us assume that instead of pre- 
ferring the experimental instructional procedure, Method X, the 
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teacher dislikes it and prefers the control procedure, Method Y. 
Further, let us assume that the dislike of Method X removes half of 
its effectiveness and the preference for Method Y doubles its effective- 
ness. Then: 
Gain 

GroupA 4 (8—4) with Method X plus dislike 

GroupB 6 (4+2) with Method Y plus preference 

GroupB 4 (8—4) with Method X plus dislike 

GroupA 6 (4+2) with Method Y plus preference 

Difference = (4 + 4) — (6+ 6) =—4 in favor of Method Y 

Thus the failure of the rotation technique to control the teacher 
factor may result in exaggerating the influence of this experimental 
instructional procedure, or it may result in creating an apparent differ- 
ence in favor of the really less desirable control instructional pro- 
cedures. When it is remembered that failure to control the teacher 
factor may be accompanied by error due to carry over of abilities, 
it will be recognized that the rotation technique does not insure de- 
pendable results. 

3. Experiments in which two equivalent groups are taught by 
different teachers in the same school. The control of non-experimental 
factors when equivalent groups are taught by different teachers in 
the same school is very similar to the control of factors when both 
groups are taught by the same teacher. There is no need to give 
attention to such factors as size of school, school organization, ad- 
ministration and supervision, school building, and community atti- 
tude and interest, since these are the same for both groups.?2° The 
fact that different teachers are used increases the importance of the 
teacher factors. In order that both teachers will be equal in their 
influence on achievement, irrespective of the experimental factor, they 
must teach with equal “skill” and “zeal” in carrying out instructional 
techniques and classroom-management procedures. The experimenter 
may seek to secure equality of these teacher factors by selecting 
teachers who have approximately the same intelligence, training, and 
experience, and who are not widely different in age or physical condi- 
tion. After teachers have been selected on the basis of equality, or 
similarity, in the above characteristics, more adequate control of 
skill and zeal may be attempted by practicing the teachers in the 
instructional procedures and materials of the experiment. In doing 
this, the experimenter should be especially careful to engender scien- 
tific attitudes toward the instructional procedures and materials in an 
rooins within & building very and tars cholo: oc aeaa ae er ee 


experiment as being carried on are not significantly different. Care should also be exerci 
to insure that the community attitude toward the experiment is neutral. reas 
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effort to minimize the influence of teacher preferences, or dislikes, for 
methods or materials. Finally, the use of detailed lesson plans in 
both groups during the experiment should be effective as an aid in the 
control of the teacher factors. 

“Personality traits” must receive attention; however, it is difficult 
to select teachers who are equivalent with respect to this factor. A 
principal or other supervisor who is intimately acquainted with the 
teachers of a school may select two teachers who are approximately 
equivalent, but since we have no satisfactory means of measuring this 
fact, the degree of equivalence cannot be determined. 

The rotation of teachers is a technique frequently used to control 
the teacher factors. At the mid-point of the experiment the teachers 
exchange groups and procedures. Group A with Method X continues 
with Method X but with the new teacher. Group B with Method Y 
continues with Method Y but with a new teacher.*? It is probable 
that the use of this technique eliminates such lesser teacher factors 
as age, sex, physical condition, and personal idiosyncrasies. It does 
not seem likely, however, that rotation of teachers adds anything to 
the control of the important factors, skill and zeal. For example, let 
8 be the gain due to Method X for one-half the experimental period, 
and let 4 be the gain due to Method Y for one-half of the experimental 
period. Then for the ideal case the gains are as follows: 


Gain 
Group A 8 with Method X 
GroupB 4. with Method Y 
GroupB 4. with Method Y (continued) 
Group A 8 with Method X (continued) 
Difference = (8 + 8) —(4+4)=8 the “true” difference due to the 
superiority of Method X. 


Now let us assume that both teachers are unskilled in the use of 
Method X so that half of its effectiveness is lost. 


Gain 
GroupA 4 (8—4) with Method X plus lack of skill 
GroupB 4 with Method Y 
GroupB 4 with Method Y 


GroupA 4 (8—4) with Method X plus lack of skill 
Difference = (4+ 4) — (4+ 4) =0 
Again, let us assume that both teachers are equally more zealous 
for the experimental factor, although they are equally skilled in the 
use of both methods. 


i i i i ds described in 
121Both teachers and pupils might be rotated, in which case the hazar k } 
the piavions discussion of rotation of pupils would be accompanied by those described in the 


present discussion. 
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Gain 
GroupA 12 (8+4) with Method X plus zeal 
GroupB 4 with Method Y 
GroupB 4 with Method Y 


GroupA 12 (8+4) with Method X plus zeal 
Difference = (12 + 12) — (4+ 4) =16 
Finally, let us assume that, although the two teachers are equally 
skilled in both methods, one is zealous for the experimental factor, 
while the other is equally prejudiced against it. 


Gain 
GroupA 12 (8+4) with Method X plus zeal 
GroupB 4 with Method Y 
GroupB 4 with Method Y 


GroupA 4 (8—4) with Method X plus prejudice 

Difference = (12 + 4) — (4+ 4) =8. This happens to be the true dif- 
ference, but notice the conditions necessary for these two factors 
to eliminate themselves. 


Thus the rotation of teachers may fail to eliminate the error due 
to the teacher factors—skill in the use of instructional procedures and 
zeal of the teacher with reference to the experimental factor. The 
rotation technique, whether of pupils, or teachers, or both, is of 
doubtful desirability since its use does not give more certain control 
than when rotation is not used. It is a dangerous technique to employ 
in any form, since it may engender a false idea that by its use non- 
experimental factors are controlled, and because rotation of pupils 
or teachers, except at the end of a term or semester, creates an ab- 
normal situation. 

4. Experiments in which two equivalent groups are taught by 
different teachers in different schools. When the experimental group 
is in one school and the control group is in another under a different 
teacher, the general school and extra-school factors become signifi- 
cant.'?? The fact that different schools are used introduces possible 
differences in instructional materials, size of school, time spent in 
learning activity (class periods and study periods), school organiza- 
tions, school administration and supervision, school buildings, com- 
munity interests in and attitudes towards the schools, children’s home 
lives, attitudes of homes toward the schools, home facilities for study, 
home duties performed by pupils, and the participation in extra-cur- 
ricular activities. The most effective means of securing control of 
these factors rests in selecting schools that appear to be as much 
alike as possible.1?? In other words, schools should be selected that 


: ne control of pupil and teacher factors is no less important than when a single school 
is used. 

Tf the results are to serve later as the basis of generalization the school selected should 
also be typical of those to which the generalizations are to apply. 
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are approximately the same size and in communities of much the same 
social and economic status. It would not be desirable to select one 
school that employs ability grouping while the other selected does 
not. It would not be advisable to select one school that has elabo- 
rate library and laboratory facilities, while the other school selected 
does not have these advantages. The control of instructional materi- 
als and time spent in learning activity is most effectively accomplished 
by preparing detailed lesson plans for both the experimental and con- 
trol groups. The possible variation in other factors, such as home 
facilities for study and extra duties performed by the pupils, should 
be investigated and the differences observed used as the basis of a 
limitation placed on the experimental conclusions. 

5. Cooperative experiments. Cooperative experiments may be 


conducted in the same, or in different schools. It is considered the 


most desirable practice to have the cooperating teachers instruct pairs 
of experimental and control groups. In the interpretation of the data 
obtained by cooperative experimentation one of two methods may be 
used. Each pair of groups may be regarded as a sub-experiment, and 
the conclusions of these sub-experiments may be compared with one 
another. The other method of interpretation is that which depends 
on the concept of the cooperative experiment as a single experiment. 
All of the experimental pupils are regarded as composing one large 
experimental group; all of the control pupils are considered as a single 
large control group.’** The difference in gains obtained will, of course, 
be the average of the differences in gains for all of the paired groups. 
The increase in size of the experimental and control groups by this 
combining of data will reduce considerably the variable errors of 
measurement, validity, and sampling that existed for the individual 
pairs of groups.?”° It is probable that systematic errors, since they 
are likely to vary from one pair of groups to another, will tend to 
offset each other. In other words, they may become variable errors 
and, hence, be more easily accounted for in the statistical treatment of 
results. It is probable that such combination of experimental and 
control pupils will aid in securing more perfect equivalence with re- 
spect to pupil factors and more perfect control of non-experimental 
factors, since departures from control in the several pairs of groups 
may tend to balance each other. It is probable, also, that the com- 
bined group of experimental pupils and the combined group of control 


1241 may be desirable to exclude one or more pairs of groups because of gross errors. 


i ; x 1 f such exclusion, see: 
ag iegiacs, on R. et al. ‘‘The Relative Effectiveness of the Problem and Lecture Methods 


of Instruction in Principles of Economics,” peers of Oregon Publication, Vol. 1, No, 7. 
Eugene, Oregon: University of Oregon, 1929, p. 290. ; ; 
- 125The eS chapter describes the interpretation of experimental data with reference to 


these errors. 
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pupils will be more representative of the pupils to whom the generali- 
zations are to apply than one of the small groups would be. The 
combining of data in this way does not guarantee all this, however, 
since it is easily possible for a systematic error of measurement, valid- 
ity, or sampling, or a lack of control of some non-experimental factor 
to run through the measures of all the groups and thus bias the com- 
bined results. For example, all of the teachers might be zealous for 
the new method of procedure that constitutes the experimental factor, 
since to be zealous for it is the mode. Again, all of the teachers might 
be unskilled in the use of the method because of its newness. If the 
cooperative group were all selected from rural schools, or all from 
city schools, representativeness with respect to all children would not 
be increased by combining results. Experimentation by cooperation of 
teachers and schools is eminently desirable, but in order to secure 
dependable results, data from the cooperating groups should be com- 
bined with care if summation of faults is to be avoided. 


CHAPTER III 
THE INTERPRETATION OF DIFFERENCES IN GAINS 


The general plan of handling experimental data. The general plan 
of handling experimental data may be illustrated by considering an 
experiment involving two groups—one an experimental group and 
the other a control group. The administration of the achievement 
test at the beginning of the experimental period! yields scores as 
follows: 

For the experimental group e;, e2,e3:--:-:- e, whose mean is F. 

For the control group ¢1, cz, ¢3 -----: C, Whose mean is C;. 

The administration of the test at the end of the experimental 
period yields a second set of scores: 


For the experimental group ey’, es’, e3’,------ e,’ whose mean 
1s Ep. 
For the control group ¢1’, ¢2’, ¢3’ +--+: c,/ whose mean is C.. 


The mean gain in achievement made by the experimental group 
is E, — E; and is designated by the symbol, ‘‘Gain E.’””2 The mean 
gain in achievement made by the control group is C2, — C; and is 
labeled, ‘“‘Gain C.” The difference in gains, D, is equal to Gain E — 
Gain C. 

The problem of interpretation. The problem of interpretation is to 
determine the extent to which the difference in gains, D, may be due 
to imperfections in the experimental procedure and in the measures 
of achievement and, consequently, to determine the extent to which 
the experimenter is justified in interpreting D as indicating the merit 
of the experimental factor. The errors introduced in the measures of 
achievement by the imperfections of the experimental procedure and 
of the measuring instruments are of two kinds: variable and system- 
atic. The effect of the variable errors is described in terms of the 
chances that, if they were eliminated, the difference would have the 
opposite sign. For example, assume that the obtained difference, D, 
is equal to 2.5. If the variable errors were eliminated, D would be 
different—possibly 4.2, possibly 6.4, possibly 0.7, possibly —1.2, 
possibly other values. The correct value cannot be calculated, but, 
if we have certain information about the magnitude of the variable 
errors, we can calculate the chances that the true value of D will fall 
within any interval. In view of the fact that D is an index of the 


1Under certain conditions it is appropriate to omit this initial test. f 
2This may also be obtained by calculating the individual gains and averaging them. 
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merit of the experimental factor, it is obvious that we are primarily 
concerned with the chances that the true D may be negative. If the 
calculated D is positive, the true D is more likely to be positive than 
negative; hence, the experimental factor is more likely to be superior 
than inferior. However the experimenter cannot make a very strong 
claim for the superiority of this factor unless the chances for the true 
D being positive are much greater than for it being negative. How 
many times greater they should be in order to justify a claim for the 
superiority is a matter of opinion. The chances are 3 to 1 in favor of 
the true difference being positive when the obtained difference, D, 
is equal to the probable error of measurement, P.E.yfeas,,, and 
slightly greater than 10 to 1 when D is equal to twice P.E.eas)- 
It may seem that these chances, especially 10 to 1, are sufficient bet- 
ting odds to justify a rather strong claim for the superiority of the 
experimental factor. Undoubtedly they do justify some claim for 
superiority, but it is a common practice to require that they be at 
least 369 to 1 in order to call the difference statistically significant? 
with reference to the variable errors being considered. This condi- 
tion is fulfilled when the difference is equal to or greater than 2.78 
times the standard error of the difference or approximately 4.4 times 
the probable error of the difference. 

It should be noted that in addition to determining the degree of 
significance of the obtained difference, D, as indicated in the preced- 
ing paragraph, it is necessary to consider the effect of the systematic 
errors due to imperfections in the experimental procedure and to 
imperfections in the measuring instruments used. When the experi- 
menter desires to generalize from his results, he must consider also 
the extent to which the two groups of pupils are representative of the 
larger group for which he desires to express conclusions. 

If the experimental group is assumed to be equivalent to the con- 
trol group, the specific questions to be considered are: 

1. What allowance* must be made for variable errors in the meas- 

ures of achievement? 

2. What allowance must be made for systematic errors of meas- 
urement not common to all groups of measures of achieve- 
ment? 

3. What allowance must be made for variable errors of validity 
in the measures of achievement? 


3Significant” and “‘significance”’ are technical terms in the field of statistics. 

‘The allowance for variable errors in the measures of achievement will be expressed in terms of a 
standard error of measurement, ¢meas., OF of a probable error of measurement, P.E.meas, The allow- 
ance for chance errors of validity will be expressed in a similar way. The allowance for systematic errors 
of measurement or for lack of control of important educative factors will be expressed as an amount to 
be added to or subtracted from the obtained difference. 
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4. What allowance must be made for lack of control of important 

educative factors? 

5. In generalizing what allowance must be made for possible non- 

representativeness of the groups of pupils? 

Although it is necessary to answer these questions separately, the 
final interpretation of the difference, D, must be based on the com- 
bined allowance for all causes. For example, if D is equal to or 
greater than ten times P.E.y,.,,,,, it doe’ not necessarily follow that 
the experimental factor is superior, because the other allowances must 
also be considered, and it might happen that the combined effect of 
these would be to reverse the sign of D, or at least to make its reversal 
not improbable, 


The allowance for variable errors of measurement. An approximate 
index of the variable errors of measurement in a group of scores may 
be determined by giving the test twice® to the pupils included in the 
experiment and computing the coefficient of correlation between the 
two sets of scores.’ This result is called the coefficient of reliability 
of the test and is designated in the following discussion by the symbol’ 
Yio. The coefficient of reliability has been determined for several tests 
and this value of riz, may be used, provided the standard deviation of 
the scores on which its determination is based is approximately equal 
to the standard deviations of the scores in the experiment. 

If the standard deviations of the distributions of experimental 
scores are not approximately equal to each other, or to the standard 
deviations of the distributions on which the reliability coefficient was 
based, then the coefficient of reliability should be corrected by means 
of Kelley’s formula for the relation between ranges in obtained scores 
and reliability coefficients.? The magnitude of the variable errors of 
measurement in the individual scores is indicated by the standard or 
probable error of measurement of a score.!° If one is using the ob- 
tained test scores the following formula should be used to compute 
the standard error of measurement."! 


5Variable errors of measurement differ for different members of a group not only in magnitude but 
in direction as well. Such errors tend to distribute themselves about zero as a mean according to the 
normal distribution, or curve of chance. The fact that variable errors group themselves in this way 
justifies one in saying that the chances are greatest that the true mean is close to the observed mean; 
the chances of the true mean being anything very different from the observed mean decrease the further 
we get from the observed mean. 


6It is desirable that two parallel forms be used. ; F 
aather means of determining this coefficient is by correlating the scores for odd and even items 


lication of the test and correcting the obtained coefficient with the Spearman-Brown formula. 
1 bate arabe of statisticians including Kelley make use of the symbol rz in place of T12. 
Kelley, T. L. Statistical Method. New York: The Macmillan Company, 1923, p. 222. 
10The standard and probable errors of a score apply to all the individual scores from which they 
were calculated, collectively. They do not apply without this interpretation to any given individual score. 
11The probable error of any measure may always be obtained by multiplying the standard error 
by the constant .6744897 or .6745. This statement applies particularly to the formulae for 7 Meas. ng 


: ’ 
7 (mtv) yy! and o (m+8)y4 which are changed to P.E. Meas. Te and so on by merely inserting the constant, 


.6745, before the expression for the standard error. 
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Oo Meas. gcore = CAD y. V1 aay lib) 


It is known that, because of variable errors, obtained test scores 
tend to vary more widely from the mean than do true test scores. 
Kelley!? has shown that this variability may be reduced and some of 
the error eliminated by computing ‘‘estimated true scores” by means 
of the following regression equation: 


X = Tio + (1 cme ria) Mx 
co 
Bes regressed or estimated true score 
tee} 
< = obtained test score 
M, = mean of the distribution of scores 


Tio = the coefficient of reliability (Kelley uses r,; for the same 

thing.) 

After obtained scores have been changed to ‘‘estimated true 
scores,’ the magnitude of the variable error in the individual scores 
is indicated by the standard error of measurement found by the fol- 
lowing formula:'® 


= é 2 
O Meas. pst. True Score 7 Dist. V1 12 — Tie 


The variable error of measurement of the mean of a group of test 
scores is obtained by dividing the appropriate formula above by the 
square root of the size of the group. If one is dealing with obtained 
scores, the formula becomes: 


o Dist. V1 Se 
oMeas.yp — /N 


If one is dealing with regressed or “estimated true scores,” the 


formula is: 
Opist. V Tie — P12" 
Oo Meas. yy = /N 
N 


These formulae are the appropriate ones to use in determining the 
standard errors of measurement of the means E;, Ea, Ci, and Cp. 
The opis, used is that obtained by calculation from the distribution of 
obtained initial or final test scores corresponding to Ei, Es, Cy, or Ce 


12Kelley, T. L. Interpretation of Educational Measurements. Yonkers-on-H : 
World eg Pee aoe p. 178. a ey SAH 
13For a description and derivation of this formula, see: Kelley, op. cit., p. 176-77. = 
the distribution of the obtained scores for which the error is being eens te yaa De 
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for which the error is being computed and may be designated by the 
symbols FDist..» Dist. FDist..» OF T Dist. 

In determining the standard error of measurement of Gain E or 
Gain C, one should insert the values of the standard errors of measure- 
ment of E; and E, or C; and C2, obtained by the use of the above 
formulae in the formulae below.'* If one has determined the probable 
errors of measurement of E; and Es, or C; and Co, the formulae to be 
used are similar. In place of each standard error substitute the cor- 
responding probable error in order to obtain the probable error of 
measurement of Gain E or Gain C.!5 


= 2 2 
oMeas.gain EE A) Titeas2 = 7Meas.y, Be 2T EE, * OMeas.p © 9 Meas.p 
1 2 
= 2 2 ; 
TMeas.gainG ~ 4 7xt005.c - TMeas.g 2r C,Cy° %Meas.c * Meas. 
2 1 2 


Standard errors of measurement of the mean gains may be com- 
puted in another way with equivalent results. To do so requires cal- 
culation of the individual gains by subtracting e; from ey’, e2 from ey’, 
¢; from ¢;’, and so on for all the individuals participating in the ex- 
periment. The standard error of the mean gain is then calculated by 
the appropriate formula below:?® 


O Distribution of Individual Gana V 1—Tie 
TMeas. Gain EorC /N 


2 
ODistribution of Individual Gains A/ Tie ae 


Meas. GainEorC /N 


14This procedure is justified only when the same test, or equivalent forms, are administered at the 
beginning and end of the experiment. When different tests are given, there is opportunity for difference 
in units and zero points that prevents computation of gains. When the use of the same test, or equiv- 
alent forms, is not feasible, comparison must be restricted to the final test means, Hz and C2, and the 
standard error of difference between these means computed by the formula: 


=, 2 2 
7 Meas.-p = fo oars +e Bicesies 
(Hy —Co) 
15This statement applies also to formulae given later for 7 Meas. Oemecs) Piss Op) arnt 
7 (mtv) py? % (m+8) Gain B’ 7 (+5) Gain CO’ and 7 (mss) p: 


The coefficients of correlation used in these formulae are, theoretically, those between the mean of initial 
test scores (H1) and the mean of final test scores (Hz) of a large number of similar experimental groups. 
The same is true for the control groups. Practically, the coefficient used is obtained by correlating the 
initial and final test scores of the experimental group to give TEE, and the initial and final test scores of 


the control group to obtain To.c . For justification of this, see: 
2 


Kelley, T. L. Statistical Method. New York: The Macmillan Company, 1923, p. 178. ¥ 
ise, should be corrected to correspond with the standard deviation of the individual gains 


used. See page 61. 
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To determine the standard error of measurement of the difference 
in gains, D, one should insert the values of the standard errors of 
measurement of the Gains E and C in the formula below."” 


= 2 2 
oMeas.p \ o Meas. gain E = o Meas.cGain Cc 


The following hypothetical example illustrates the use of the pre- 
ceding formulae. It is assumed that equivalent forms of an achieve- 
ment test were used whose coefficient of reliability, ri2, is equal to .85. 
It is also assumed that the correlations between the initial and final 
test scores have been computed for both groups, and the means and 
standard deviations of the four distributions obtained. These hypo- 
thetical values are: 


ere OO) Hy = 73.32 pist., = 7-60 
Tp.E, = -/1 E, = 76.25 Cpist = 7-44 
Tc,c, = -65 Cie 73.20 Tpist., = 7.56 
Ni tos C, = 74.12 opie = 7.84 


Gain E = E, — i = 76.25 — 73.32 = 2.93 


D, the difference in gains = Gain E — Gain C = 2.93 — .92 = 2.01 


_ 7.60*/1— .85 
7Meas.p, = 4/25 


17One step of the total procedure may be eliminated by the use of the following formula: 


= .5887 


oO = 
Meas.) 


2 2 2 2 ‘ 4 = A . 
/o Meas. +7 Meas. +7 Meas. +7 Meas.c, —2TE\E | 7Meas. jy "7 Meas. yp, AGC. Meas.¢ "7 Meas.c, 


For a derivation of this formula with respect to errors of sampling, see: 

Lindquist, E. F. and Foster, R. R. ‘On the Determination of Reliability in Comparing the Final 
Mean-Scores of Matched Groups,” Journal of Educational Psychology, 20:102-106, February, 1929. 

The comment might be made that these formulae neglect the correlation that may exist between 
the gains of the paired pupils. In other words, the expression, —2rge go 7 Meas. Gain E 7 Meas. cere 
where Tge go is the coefficient obtained by correlating the distribution of individual gains of the experi- 


mental pupils with the distribution of individual gains of the control pupils, should also be included 
under the radical of the formula given above, or under the radical in the long formula just given. The 
authors just referred to justify its exclusion by the statement, ‘‘But since there can be no real correlation 
eunert the scores of one group and those of another... .. may be omitted from the equation. on fe 
p. 105. 

Coefficients of correlation are regularly obtained by correlating two distributions of measures of 
the same individuals. The uncertain conclusions of research on the effect of practice on individual 
differences would cause one to question the dependability of a coefficient obtained by correlating gains 
of paired individuals. Owing to the uncertainty of this correlation the probable and standard errors 
obtained with the above formula are interpreted as “limits beyond which the true error cannot fall.” 
For arguments in favor of the inclusion of this expression, see: 


Walker, H. M. ‘‘Concerning the Standard Error of a Difference,’ Journal of Educational Psy- 
chology, 20:53-60, January, 1929. . 
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oe TAAN/ 8h 


?Meas.p, = 4/25 = .5763 
7.56 —. 

7 Meas.c, = wae = .5856 
7.84/1— .85 


oMeas.. = ———S = = .6073 
at /25 


One et \/ C5887)? D763)? = 2.71 SC.887 5763 = 4438 


OMeas.cain =. V (-5856)? + (.6073)? — 2 X .65 X.5856 X .6073 = .4994 


CMake = »/(.4438)? + (.4994)2 = .6681 or .67 


Since the difference in gains, which is 2.01, is three times as large 
as the standard error of measurement of the difference, which is .67, 
the following interpretation is justified. Considering only the var- 
iable error of measurement and assuming that errors due to faulty 
equivalence, failure to control external non-experimental factors, and 
departure from validity of the measuring instruments have been 
eliminated, or otherwise accounted for, then for the groups concerned, 
and only for the growps concerned, the difference in achievement indi- 
cates the superiority of the status of the experimental factor prevail- 
ing in the experimental group. Subject to the limitations just ex- 
pressed, the probability that the observed difference has the. same 
sign, or is in the same direction, as the true difference is greater than 
the ratio 740 to 1.!9 Stated in another way, if the experiment could 
be repeated with the same groups, under the same conditions, the 
chances of obtaining another observed difference of the same sign, or 
in the same direction, are greater than the ratio 740 to 1.”° 

The example given illustrates the calculation when obtained 
scores and the standard errors are used. Other examples might have 
been given using regressed scores with the standard errors, obtained 
scores with the probable errors, or regressed scores with the probable 
errors. Although the calculation of these is similar, care must be 


19As has already been explained, the standard or probable error obtained by the procedure outlined 
is regarded as a limit. If it were feasible to obtain a reliable coefficient for the small amount of correla- 
tion that may exist between the gains of the paired pupils and thus to arrive at a more accurate and an 
always smaller standard error, the chances of statistical significance would of course be greater. 

2The comment might be made in regard to this interpretation that repetition under the same ex- 
perimental conditions with the same groups should secure differences not only of the same sign, but of 
the same magnitude. Identical differences would be secured with identical conditions and groups, if it 
were not for the unreliability of the initial and final tests. The standard and probable errors of measure- 
ment of a difference allow for this unreliability and nothing else. 
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taken to use the appropriate formulae. The following table gives the 
chances of statistical significance of differences that are a given num- 
ber of times larger than the standard or probable error of the differ- 
ences. The second column gives the chances of the true difference 
falling within the range, plus and minus, of the probable or standard 
error of the difference. This interpretation is less applicable to experi- 
mentation than that given in the third column. The experimenter is 
most interested, not in the magnitude of the observed difference, but 
in the probability that the observed difference has the same sign as the 
true difference. When these chances are great, 369 to 1 or better, the 
experimenter is justified in asserting that the variable errors of measure- 
ment do not destroy the dependability of a conclusion in favor of the 
superiority of the experimental factor.?! 


TaBLeE I. 
CHANCES OF STATISTICAL SIGNIFICANCE OF A DIFFERENCE 


The chances that the The chances that the true 
true difference does not | difference has the same 
differ from the observed | sign, or is in the same di- 
difference by more than | rection, as the observed 
the indicated amount. difference. 
DLS CT ee ek 2.15 to 1 25) bon 
BS Oe Oak extn neon aaa 21 to l 43 tol 
D247 Sony ee ee re 184 tol 369 tol 
Dee Ypres eva ao oncceree 369 tol 740 tol 
Det A eea rs en at ee Loge OL Shas} G10) 1 
1D) yy, DE Dae Poa wera. Slate 1 to 1 3) cmt 
DD R= 92) Po Hay eeereede ee ane 46 tol 10.3 to 1 
IBN SP BR Bal Dhara ae Oe cos 6 hoc Pye a oval 45 tol 
ID 2d ON So ie 1425 Sto 1 286 tol 
ID E=Mayd gel devs die Sema aoe i340) tod 2,684 tol 


*This multiple of the standard error of difference appears in McCall’s formula for the experi- 
mental coefficient: 


EG == Re 
2.78 X Tpifference 


When the expression is equal to 1.0, the chances that the true difference has the same sign are in 
the ratio of 369 to 1. McCall uses this as the critical point below which differences should not be recog- 
nized _as significant. If the chances are greater than 369 to 1 then the difference is to be recognized as 
significant. The statement, ‘‘An experimental coefficient of 1.0 is just exactly practical certainty. An 
experimental coefficient of .56 means half certainty, one of 2.0 means double certainty and so on,” is not 
very meaningful since it is impossible to multiply certainty. See: 


mee W. A. How to Measure in Education. New York: The Macmillan Company, 1922, 
p. 404-405. 


The allowance for variable errors of validity. Achievement isnot a 
unitary thing. It includes three types of controls of conduct: (1) spe- 
cific habits; (2) knowledge; (3) general patterns of conduct. In a 
given case the achievement to be considered may be restricted to only 
certain elements under one of the rubrics. For example, in an experi- 


21See footnote on page 60. 
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ment to determine the relative merits of two methods of teaching 


addition, the achievement to be measured might be restricted to the 
skills (specific habits) that function in doing examples of addition of a 
specified type. In an experiment to determine the relative effect of 
certain methods of teaching English literature in the high school, the 
achievement to be measured might be restricted to changes in the 
interest of the pupils in reading literature of a specified type. On the 
other hand, when the problem of an experiment asks concerning the 
effect of an educative factor without any restrictions, there is the 
implied requirement for measuring all elements of the resulting pupil 
achievement which may include specific habits, knowledge, and gen- 
eral patterns of conduct. 

In order for an achievement test or a group of such tests to yield 
results that are valid for a given experiment, it must measure, either 
directly or indirectly, all of the elements of the achievement or a 
representative sample of all of the achievements specified or implied 
by the statement of the problem of the experiment. 

The allowance for the variable errors of validity can be calculated 
if the coefficient of validity is known. In order to obtain this coeffic- 
ient it will, of course, be necessary to have a valid criterion measure 
of the achievement specified by the problem of the experiment. If 
such measures were available for the pupils in the experiment, they 
would be used, and the question of validity would be eliminated. 
This will seldom be the case, but it may happen that the test used has 
been validated previous to its use in the experiment by calculating 
the coefficient of correlation between the scores it yields and the valid 
criterion measures. If this coefficient, rjc , is known and the standard 
deviations on which it is based are approximately equal to thestandard 
deviation of the obtained scores, then the gross”? allowance for vari- 
able errors of validity (validity and measurement) may be calculated 
by the following formulae: 

o Dist. VJ 1— tro 
P(m+v)E ,E,,C,orC, \/N + 


2 2 eS) : 
Tm+vGanE 4 Mn tye, 2) 7 (m+v)x, 2T EE, Tm+v)p 7 (m+v)p, 


2 2 i 
Tm+v)Gainc A) %n+y0, ar CS a tonal 2TC,c, Fimtvc* %(m+v)o, 


2 2 
%m-+v)p — Nicene E AUF F(m+v)Gain C 


22The allowance indicated by the use of these formulae will be too large, because the criterion meas- 
ures, as well as the measures being validated, include variable errors of measurement. It is useful, how- 
ever, as a limit beyond which the probable error of validity cannot go. 
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The standard or probable error of the difference between the mean 
gains secured by the use of these, or similar, formulae is to be inter- 
preted in the same way as the standard or probable error of the dif- 
ference between the mean gains due to variable errors of measurement 
alone. In the interpretation, however, it should be pointed out that 
the measure of error secured covers the variable errors of validity of 
the test used, the variable errors of measurement of the test used, 
and the variable errors of measurement of the criterion. To the extent 
that the criterion departs from validity itself, the measure also in- 
cludes variable errors of validity of the criterion. Therefore, the index 
secured should be dealt with as a limit rather than as an accurate 
means of allowing for variable errors of validity. If the difference 
between the means, or mean gains, is 2.78 or more times the standard 
error of this difference as found by the above formulae, one is justified 
in stating that it is statistically significant since the calculated meas- 
ure of error is known to be somewhat larger than the true measure of 
error.?° 

It will seldom be possible to calculate, even approximately, the 
allowance for chance errors of validity in the measures of achievement 
and, hence, usually the experimenter must make an estimate. In 
making this estimate, the specifications, both explicit and implied, of 
the problem of the experiment in regard to the achievement to be 
measured must be clearly recognized. In many cases the quality of 
permanancy is implied, and when there is this implication, it must be 
considered. No rules can be specified for estimating the allowance for 
chance errors of validity, but it is probably true that, except when the 
achievement is confined to specific habits or is relatively narrow, this 
allowance is likely to be equal to or greater than the allowance for 
chance errors of measurement. 


The allowance for systematic errors of measurement or validity, lack 
of equivalence of the groups, and non-equality of significant non-experi- 
mental factors. In considering the allowance for systematic”! errors 
of measurement or validity it should be noted that the effect of a 
systematic error is eliminated from the mean gain in achievement 
when it is the same in the two sets of scores from which the gain is 
computed.”> Similarly, the effect is eliminated in the difference when 


23See footnote on page 64. 


*4Errors which are present in all the scores of a group, not necessarily of the same magnitude, but 
always in the same direction are called systematic errors. For example, if distances of 9, 12, and 15 feet 
were measured by a yardstick one-half inch too long, systematic errors would occur of 144, 2, and 2% 
inches. The older term ‘constant error’ is less desirable, since it implies that the individual errors of 
the members of the group be of the same magnitude. 

25Let Mi = 80 (the initial test mean of the group) 

_ Let Mz = 84 (the final test mean of the group) 

Then if each has a systematic error of +2, and if this is deducted, M1 = 78 and M2 = 82; but the dif- 
ference between them, or gain, is still 4. 
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it is the same for the two gains that are compared. Hence, it is 
necessary to consider only the cases in which these conditions do not 
prevail. 

Systematic errors of measurement may result from the failure to 
control conditions at the time of measurement. For example, the 
teacher of the experimental group may permit the students of that 
group to spend a few more minutes on the final test than were allowed 
the control students. The control group may be given the test with 
more ample directions than are given to the experimental group. In 
interpreting a difference, an experimenter should inquire whether 
possible systematic errors of measurement or validity in the measures 
of achievement have been eliminated in the manipulation of the data. 
In case it does not seem highly probable that they have been elim- 
inated, he must estimate their probable effect upon the difference of 
gains. 

Systematic errors of validity may result from failure to use tests 
which are equally valid with respect to the achievement of both 
groups. For example, in a comparison of the project and the assign- 
ment methods, the tests used may favor the specific abilities possibly 
more favorably engendered by the assignment method and, hence, 
may not evaluate adequately the more general abilities acquired by 
the project pupils. Thus, a systematic error of validity may cause 
the difference to be interpreted in favor of the assignment method, 
whereas, if the values obtained by the project pupils had all been 
measured, the opposite conclusion might have been reached. 

The allowances to be made for systematic errors of measurement 
and validity cannot be calculated in quantitative terms. Estimates 
must be determined and applied as limitations in the interpretation 
of results. 

It is difficult, if not impossible, to estimate accurately the effect of 
lack of equivalence of groups upon the difference between the gains. 
For example, consider an experiment in which the attempt is made to 
determine the relative effectiveness of two techniques of drill on 
arithmetical calculation—Technique X and Technique Y. Suppose 
the two groups differ slightly in mean mental age, that for Group A 
being 12.45 and that for Group B being 12.68. Assuming equivalence 
in all other respects, what allowance for this non-equivalence should 
be made in interpreting the difference between the gains made by the 
two groups? In order to answer this question accurately, it would be 


26Let G1 = 6 (the gain in achievement for the experimental group) 
Let Go = 2 (the gain in achievement for the control group) , 
Then if each has a systematic error of —3, and if this is deducted (algebraically), Gi = 9 and 


Ge = 5; but the difference between them is still 4. 
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necessary to know the influence of mental age upon the achievement 
concerned under the conditions of the experiment. In general, this 
information is not available; therefore, the experimenter can only 
estimate roughly the probable effect of the lack of equivalence in 
mean mental ages. 

It should be noted that the influence of non-equivalence of groups 
upon the difference of the gains may either be positive or negative. 
This fact expressed in equation form would be: 


Calculated difference = true difference + effect of non-equivalence 


Hence, the true difference may be exaggerated, minimized, or negated 
by the lack of equivalence of the groups. 

Failure to keep equivalent one or more of the important non- 
experimental factors will affect the difference of the gains in achieve- 
ment. For example, when a teacher of the experimental group be- 
lieves in the method of teaching that forms the experimental factor 
and is zealous in carrying it out, the gain of the experimental group 
is likely to be greater than it would be under a teacher who is prej- 
udiced against the method, or one who is neutral with reference to it. 
Hence, a lack of equivalence of ‘‘teacher zeal’ will introduce a sys 
tematic error into one of the gains. It is difficult to determine the 
magnitude of such an error and, hence, to correct for it. An approxi- 
mation to its magnitude may be obtained by performing a supple- 
mentary experiment in which the uncontrolled factor becomes the 
experimental factor. This is not often feasible. Usually, the expert- 
menter goes to the literature for experimental evidence to prove that 
this factor has a negligible effect on learning, or he gives reasons based 
on observations made during his own experiment to show that the in- 
fluence of the factor is not sufficient to destroy the significance of his 
findings. It is evident that failure to control important educative 
factors, with consequent introduction of constant errors of unknown 
magnitude and direction, will render small differences in gains 
insignificant. 

In concluding this consideration of systematic errors in experi- 
mental studies, it should be emphasized that the allowance to be 
made for them in interpreting the difference in gains is likely to be 
much larger than that to be made for the variable errors of measure- 
ment and of validity. Since the allowance for variable errors is in- 
versely proportional to the square root of the number of scores, this 
allowance will become relatively small when the size of the experi- 
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mental and control groups is large. The size of these groups does not 
affect the systematic errors, if other conditions remain the same. 
This qualifying clause is added because when conditions are varied 
in increasing the size of the groups, as is likely to be the case in a 
cooperative experiment, the systematic errors may be decreased. 
When this occurs, the variable errors are increased, and, conse- 
quently, the statement that the allowance for these errors is inversely 
proportional to the square root of the number of scores is not always 
true. However, it is impossible to know precisely what happens in a 
particular case; therefore, it is a safe plan to assume that the size of 
the group does not affect the systematic errors. 

The allowance in generalizing for non-representativeness of groups 
used in the experiment. In generalizing the interpretation of a differ- 
ence between gains, the representativeness of the groups of pupils 
used in the experiment must be considered. In doing this, two cases 
are encountered: (1) the pupils selected in a random manner such that 
the groups may be expected to be representative except for the opera- 
tion of chance; (2) the pupils not selected in a random manner and 
the non-representativeness of the groups due to factors other than 
chance. The allowance for probable non-representativeness in the | 
first case can be calculated by certain formulae, which are described 
below; but this case is not the one usually encountered. In fact, it 
would be practically impossible to select by a random method two 
groups of pupils for an experiment. For example, if it were desired 
to select two random groups from the fourth-grade pupils in a city- 
school system, the names of all such children could be arranged alpha- 
betically and the names taken from this list in a random manner. It 
is apparent, however, that such a procedure would seldom, if ever, be 
feasible. Hence, in generalizing the interpretation of a difference be 
tween gains, an experimenter has to deal with the second case, and 
for this case, no formulae are applicable. 

1. Generalizing from a random sample. A random sample is not 
necessarily a perfectly representative one.?” Chance is operative, and 
the mean of the sample may or may not coincide with the mean of 
the entire population from which the sample was drawn. The larger 
the sample the greater the chances that the mean of the sample will 
be somewhere close to the mean of the population. It may fall above 
or below. Where it is likely to fall is shown by determining the stand- 


27 llent discussion of sampling, see: 
Walken we “The Paarnolag Proper in Educational Research,”’ Teachers College Record, 


30:760-74, May, 1929. 
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ard or probable error of the mean due to sampling. The formula’ to 


be used is: hae 
o Dist. V Tz 
TSamplingy — /N 


Since the experimenter needs to know the magnitude of the com- 
bined effect of sampling and the variable errors of measurement, the 
following formula may be used when dealing with random groups. 


O Dist. 


F(s+m)mM — /N 


The values obtained from this formula for the standard errors of 
means Hj, Ep, Cy, and C2 should be inserted in the following formulae” 
to determine the standard errors of measurement and sampling of the 
mean Gains E and C. 


2 2 
%(s+m) — 4) stm = 10 Guttin) ge 2T EE,’ T(e+m)p * %(s-+m)p 
Gain E 1 3 a - 


2 2 : 

7 (4m) = 4) o%(s-+m)c + Fe4+m)o — 2TC,C,* F(s+m)g * F(e+m)c 
Gain C 1 3 i - 

If the individual gains have been determined, the standard errors 


of measurement and sampling of the mean gains may be determined 
by the following formula: 


? Distribution of Individual Gains 
Sm+s) = NN 
Gain E or Gain C /N 


*8It has been shown that the standard error of the mean due to sampling alone is equal to 
Dist. ry5 isle 
Re? where V/T12 is the index of reliability of the measuring instrument used, and isnot equal to 
o Dist. 
/N 


Holzinger, K. J. Statistical Methods for Students in Education. Boston: Ginn and Company, 
1928, p. 232. 


, the formula generally used, and which, for example, is given in: 


o 
Kelley, and Huffaker and Douglass have shown that N ordinarily supposed to stand for merely the 


errors of sampling, includes both the errors of sampling and of measurement, see: 
Kelley, T. L. ‘*Note Upon Holzinger’s Formula for the Probable Error,” Journal of Educational 
Psychology, 14:376-77, September, 1923. 
uffaker, C. L. and Douglass, H. R. ‘‘On the Standard Errors of the Mean Due to Sampling and 
to Measurement,” Journal of Educational Psychology, 19:643-49, December, 1928. 
29One is justified in using these formulae only when the same test, or equivalent forms of it, are 
administered at the beginning and end of the experiment. When this is not feasible, comparison must 


be restricted to the final test means, Hz and C2, and the standard error of measurement and sampling 
computed by the following formula: 


Cc = 
(m+8) pF 4/0? (m+s) 5 + o? (mts) G 

(Eg —Cg) 2 2 
See footnote on page 63. 
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The standard errors of the mean gains may then be inserted in the 
following formula*® to obtain the standard error of measurement and 
of sampling of the difference, D: 


F(s+m)p ae Na ene i O(e4+m) gain Cc 


The expression —2ge gc F(s+m)gainz * %s+meeig¢ 18 not included 
in the above formula for the reason given in the footnote on page 64. 
In order to illustrate the use of these formulae, the data used in the 
example given to illustrate the calculation of the standard errors of 
measurement of a difference will be used again in calculating the 
standard error of measurement and sampling of the difference 2.01. 
The illustration applies to the procedure in which the standard errors 
of the initial and final test means are first computed. 


7.60 7.44 

Te+m)z, —_ 4/25 = bye 7(s+m)p, = 4/25 = 1.488 
7.56 7.84 

%(s+m)c, = 4/25 ae 7(s+m)o, = 4/25 DOS 


= \ (1.520)?+ (1.488)?—2 X.71 X 1.520 XK 1.488 = 1.1458 


T(s+m)Gain E 


Cetmoan¢ = N(1-512)2+ (1.568)?—2X.65 X 1.512 X 1.568 = 1.2894 


= \/(1.146)?+ (1.289)? = 1.7263 or 1.73 


%(s+m)p 


The chances that the true difference has the same sign, or is in the 
same direction, may be obtained approximately from the second 
column of Table I. Calculated more accurately—since the differ- 
ence 2.01 is 1.16 times the standard error of the difference—the 
chances are slightly greater than 7 to 1 (interpreting the standard 
error as a limit) that the true difference will have the same sign. 
Authorities recommend that the experimenter should not consider 


30If the errors of E1, E2, Ci, and C2 have been computed the formulae given may be combined into: 


7 (sim)p 


2 2 2 - o a = 79% o of ) 
\/ Bie Pe (+m) pt? (arm)o, +7 (s+m) G, aE EB (s+m)p, 7 (s+m) py ron on GH) g: (e+m C. 
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such a difference as statistically significant when attempting to gen- 
eralize from his data. The combined effect of variable errors of meas- 
urement and of variable errors of sampling, alone, is sufficient to 
render the statistical significance of the difference inferior to that 
customarily demanded. The findings may be considered significant 
so far as the groups, themselves, are concerned, since other factors 
were assumed to be controlled. 

If the chances had been near or greater than 369 to 1, as will fre- 
quently be the case, generalizations should still be made with con- 
siderable caution. The use of these formulae is justified only when 
the sample is random; and the chances given in the table refer only 
to the allowance to be made for variable errors of measurement and 
variable errors of sampling. They do not guarantee that the differ- 
ence is significant in spite of faulty equivalence, poor control of ex- 
perimental conditions, or even carelessness in computation. In other 
words, the difference is to be regarded as significant and worthy of 
being used as a basis of generalization when the following conditions 
have been satisfied: 

1. The groups are equivalent at the start of the experiment, or the 
departures from equivalence have been shown to be insig- 
nificant. 

2. All of the non-experimental factors have been controlled during 
the course of the experiment, or failure to control has been 
shown to be negligible in effect on the difference in gains. 

3. The measuring instruments used have been shown to possess 
high validity. Measures of achievement have been secured 
not only for specifie abilities but for general abilities as well. 

4. The testing conditions have been the same for both experi- 
mental and control groups, or systematic errors of measure- 
ment resulting from failure to secure identical testing con- 
ditions have been shown to be insignificant. 

5. The sample has been shown to be random, i.e. selected without 
bias from the population to which the generalization is 
applied. : 

6. The difference of the gains is equal to or greater than 2.78 
times the standard error of difference due to variable errors 
of measurement and sampling.*! 

When the above conditions have not been met, the conclusions 
must be appropriately restricted. In addition to making allowance 
for 

T(s+m)p or PEGs 


81See page 60. 
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it is necessary to allow for the estimated effect upon the difference be- 
tween the gains due to failure to secure the conditions listed above. 

2. Generalizing from a sample that is not random. As pointed out 
on page 71, the experimenter usually works with groups that are not 
random samples of a larger population and, hence, must estimate 
rather than calculate, the allowance to be made for probable non- 
representativeness of the groups. For making this estimate, no spe- 
cific rules can be stated. As a general procedure the experimenter 
should consider all available evidence relative to the traits of the 
groups concerned. For example, the intelligence test scores will be 
known, and the experimenter should show how the mean and stand- 
ard deviation of these scores compare with the corresponding meas- 
ures of the larger population. If the available evidence indicates that 
the groups are highly representative of the larger population, he may 
generalize with considerable confidence; if the evidence indicates that 
the groups are not reasonably representative of the larger population, 
he must refrain from generalizing or appropriately limit his state- 
ments. 

Concluding statement. The preceding discussions of the signifi- 
cance of differences in gains should make it clear that caution must be 
exercised in interpreting a small difference and that the interpretation 
cannot be accomplished by the application of any formula or group of 
formulae. In general, it is necessary to inquire carefully and criti- 
cally into the conditions of the experiment; then the best that can be 
done is to estimate the allowance that should be made for imperfec- 
tions in the data. Since an estimate must be considered only an 
approximation, it follows that the interpretation of a relatively small 
difference in gains must be somewhat uncertain. When the differ- 
ence is relatively large, definite conclusions may be Justified, but even 
in this case they must be restricted to the conditions of the experi- 
ment. For example, in the experiment with the project method 
Collings®? obtained very large differences in gains, much larger than 
any reasonable estimate of the total allowance for non-equivalence of 
groups, variable and chance errors of measurement and of validity, 
and failure to control non-experimental factors, provided zeal is in- 
cluded in the experimental factor. Hence, he is justified in asserting 
that under the conditions of the experiment the project method as 
applied is distinctly superior to usual methods of teaching as exem- 
plified in the control schools. He is not justified in any statement, 
except with appropriate qualifications, concerning the relative merits 


32See page 36. 
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of the project method in general or as it might be applied by other 
teachers. 

In regard to generalizing from an experiment, it should be noted 
that application of the formula: 


T(st+m)p — AED eka E te 0" (s+) Gain fo) 


is justified only when it is reasonably certain that any non-representa- 
tiveness is due to chance. If factors other than chance may have 
operated, this formula cannot yield the allowance that should be made 
for non-representativeness. Furthermore, it should be noted that 
this formula should be used only when generalization is attempted. 
If the conclusions are restricted to the groups of pupils concerned, 
its use is superfluous. 


CHAPTER IV 


A CRITICAL EVALUATION OF EXPERIMENTAL STUDIES 
RELATING TO SUPERVISED STUDY 


In the following description of the requirements for a precise ex- 
periment it is assumed that the objective of supervised study is not 
primarily that of increasing achievement in the particular school sub- 
ject studied, but instead is that of engendering habits of study which 
will increase the effectiveness of the learner’s efforts generally, not 
only at the time of supervision, but in the future. Hence, the purpose 
of the experiment is to evaluate, in terms of the attainment of this 
objective, the group of techniques employed by the teacher during the 
supervised-study period by comparing the attainment of the super- 
vised pupils with that of pupils whose study is characterized by a 
different type of supervision, or more usually, by the absence of super- 
vision. 

Requirements for the conduct of an experiment in supervised study. 

1. Specification of supervised study as an experimental factor. The 
term ‘“‘supervised study” is used to designate a variety of instructional 
procedures and combinations of instructional procedures employed by 
the teacher in connection with the studying of his pupils and designed 
to guide them to the acquisition of efficient study procedures. These 
instructional procedures include making the assignment; giving gen- 
eral rules for study; making suggestions for the doing of learning ex- 
ercises; answering questions that members of the group desire to ask; 
inspecting work as it is being done, and calling attention to faults; 
giving direct assistance; suggesting supplementary learning exercises 
to individual pupils; providing aids for study such as reference books, 
maps, pictures, etc.; and maintaining a place suitable for study. Defi- 
nition of supervised study as an experimental factor requires detailed 
specification of the particular instructional procedures to be employed 
during the study period of the experimental group. Where comparison 
is to be made between two types of supervised study, the instructional 
procedures for the study periods of both groups must be specified in 
detail. This means that the supervised-study procedure to be* em- 
ployed must be described in writing, or at least a detailed record must 
be kept of what was done. 

2. Equivalent groups. The groups of pupils used in the experiment 
should be equivalent in all respects that affect methods of study and 
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learning in the subject to be studied. This requirement can be ap- 
proximated by pairing pupils on the basis of intelligence test scores 
and then comparing the groups thus formed with respect to chrono- 
logical age, previous achievement in the school subject, and study 
habits. If the differences between the means and the standard devia- 
tions of the groups with respect to these three characteristics are rela- 
tively small, the groups may be considered approximately equivalent. 
It is desirable that the groups also be approximately equivalent with 
respect to personality traits, physical conditions, sex, and race. 

3. Control of teacher factors. Control of the teacher factors in- 
volves maintaining the same status of all factors under this head in 
both the experimental and the control groups, except the procedures 
specified as supervised study and the corresponding procedures em- 
ployed with the control groups; or if the same status is not main- 
tained, the amount of non-equivalence must be measured and its effect 
upon the engendering of study habits and the experimental learning 
must be determined. The factors whose control in supervised-study 
experiments appear to be the most important are: (1) instructional 
techniques employed during the recitation period, especially those re- 
lating to the assignment; (2) motivation techniques; (3) skill of 
teacher in carrying out instructional techniques and classroom-man- 
agement procedures; (4) zeal of teacher; (5) personality traits. In 
addition, care should be exercised to avoid marked differences in the 
minor teacher factors. 

4. Control of general and extra-school factors. The important fac- 
tors under this head are: (1) materials of instruction, (2) environ- 
ment in which pupils study, and (3) time per day devoted to study. 
The pupils in the control group should be given as convenient an 
access to reference books, maps, charts, and other aids to study as the 
members of the experimental group; and desks, chairs, light, heat, 
ventilation, and other aspects of the study environment should be 
identical for both groups. This means that the pupils of the control 
group should study in a classroom during the period devoted to super- 
vising the study of the other groups and in the presence of their 
teacher, but without any other supervision’ than that required to 
maintain order and to hold them to their tasks. The other general 
school factors and the extra-school factors should be controlled by 
means of the methods described on pages 45-47 and 49-50. 

5. Measurement and interpretation of differences in gains in 
achievement and in the acquisition of study habits. Equivalent forms 


1Unless two types of supervised study are being compared. 


EXPERIMENTAL RESEARCH IN EpucaATIoN 79 


of an achievement test of high reliability and validity should be used 
as initial and final tests. An objective measurement of study habits 
should be made at the beginning and end of the experimental period 
so that the relative acquisition of study habits may be compared. 
Both groups should study in the unsupervised fashion for a time after 
the close of the experiment. After a period of some months the pupils 
should be tested for achievement in the.subject matter in which they 
are at the time engaged, and for the possession of good study habits. 
Superiority in achievement for the experimental pupils, if shown after 
this lapse of time, will constitute a weighty argument for the super- 
vised-study procedures. Retention of study habits, after the removal 
of supervision, as indicated by superiority in achievement and by their 
direct measurement means the attainment of the objective of super- 
vised study and, in consequence, a favorable conclusion for the ex- 
perimental factor. 

In the interpretation of differences in gains in achievement, or in 
the acquisition of study habits, the techniques described in Chapter 
III of this bulletin should be used. 

6. Generalization. The groups of pupils should be representative 
of the population to which the generalizations are to be applied. Un- 
less the pupils involved in the experiment have been selected by a 
process of random sampling, evidence should be presented to show the 
extent to which the groups are typical. If the groups are not typical, 
the conclusions must be restricted accordingly. 

The teachers selected to conduct the experiment should be typical 
of those teaching the subject in general, and the instructional and 
classroom-management procedures they use in the recitation should 
be representative of sound educational practice. 


Descriptions of experiments in supervised study. 1.° Earhart, 
in 1906, conducted the pioneer experimentation in this field.? The 
first of the experiments described in her monograph was, in a sense, 
controlled. Five sixth-grade and four seventh-grade classes were 
trained in finding the subject of the lesson, in organizing the sub- 
ject-matter, in verifying the authors’ statements, and in supplement- 
ing the lesson. The possession of the abilities to perform these activi- 
ties was measured, before and after training, by means of sample 


2The experiments described are all those which have been reported in the more easily 
accessible literature of education. In addition to those described, several experiments have 
been reported merely as unpublished masters’ and doctors’ theses in education. 

’Barhart, L. B. “Systematic Study in the Elementary Schools,’’ Teachers College, Co- 
lumbia University Contributions to Education, No. 18. New York: Bureau of Publications, 
Teachers College, Coe Vaigersity eyes geek re 

he experiments described is also reported in: : 
fot T, Pa Cnineiinent in Teaching Children How to Study,’ Hducation, 30:236-44, 


December, 1909. 
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lessons for which the pupils were asked to indicate the steps they 
would take in finding the subject, and so on. The measurement was 
completed by a series of tests in which the pupils were requested to 
perform the activities listed above. Fifteen sixth-grade and four 
seventh-grade classes that had not received the training were given 
the same tests and the results compared with those of the trained 
classes. The second of the experiments reported in the monograph 
was not controlled. A single group of twenty fourth-grade pupils 
was used. The supervised study consisted of giving the pupils an 
aim for each of sixteen lessons and encouraging them to ask questions 
as they studied literature in the presence of the teacher. No quanti- 
tative results are given for this second experiment. The following 
conclusion is stated for both of the experiments: 


The results of this series of lessons, coupled with the results of the tests in 
geography given to the sixth and seventh grades, indicate strongly that pupils 
in the elementary schools in grades including the fourth as well as higher classes, 
are able not only to employ the factors of logical study, but also that by means 
of systematic efforts, they can be made to improve in their employment of 
them.* 


2. Breslich® is to be credited with having conducted the first care- 
fully controlled experiment in supervised study. Two groups of high- 
school pupils of unreported size were selected of approximately equal 
ability in algebra as shown by their final examination grades of the 
preceding semester. The control or unsupervised group recited in the 
traditional manner for forty-five minutes and prepared the advance 
assignment during the study hour or at home. The experimental group 
recited during one period and used the next for study in the presence 
of the teacher who employed the following instructional procedures: 
passing about the room, watching the pupils at work, offering sug- 
gestions, giving no help until a serious effort had been made by the 
pupil, and stopping the whole class for discussion of mistakes that 
might become general. The technique used by the teacher is charac- 
terized by the adaptation of the instructional procedures to meet the 
needs of the moment or of the individual. The results given in terms 
of school grades at the end of fourteen weeks show the supervised 
group to have achieved slightly more. It is stated that the poorer 
pupils profited most and that the brighter students seemed to have 
suffered some loss. A rotation of the groups at the end of the fourteen 
weeks and a continuation of the experimentation for six lessons re- 
sulted in the former supervised group maintaining its superiority of 


4EKarhart, op. cit., p. 79. 
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achievement, thus indicating that study habits acquired under super- 
vision continued to function after the supervision had been removed. 

3. Minnick® conducted an experiment in supervised study in which 
two groups of eighteen tenth-grade pupils in plane geometry were 
approximately equivalent with respect to means and measures of vari- 
ability of school grades in algebra. The experimental group met for 
forty minutes of recitation and forty minutes of study of the ad- 
vance assignment, during which the teacher answered questions and 
made suggestions. Additional work was given to the brighter students 
to keep them busy during the hour. The control group recited and 
studied in the usual fashion. Both groups had the privilege of asking 
the instructor questions during his consultation period. After a period 
of fifteen weeks, the results were reported in terms of daily recitation 
grades, grades on six-weeks tests, and grades on the final examina- 
tion. On the basis of the consistent superiority of the supervised 
group as shown by these grades, the following conclusion is reported: 
“.... students under such instruction not only master the text 
more thoroughly but are more able to take the initiative in new work 
than are the students under the unsupervised plan.’ 

4. In an uncontrolled experiment by White,’ all the classes, with 
the exception of those in shop or laboratory courses, of a four-year 
high school were given thirty minutes of directed study during the 
sixty-five minute divided periods. The chief supervised study pro- 
cedure was that of help and encouragement of backward pupils. After 
a trial of eight weeks, it is stated that the use of supervised study 
resulted in lower costs, less withdrawal from school, more work for 
the principal, and dislike for the plan by some of the teachers and 
parents. Although the author states early in his report that, “It is 
worth much more to a pupil to have an instructor teach him how 
to study than to teach him Latin or algebra,’® nothing is said as to 
whether or not this belief was justified by the experiment. 

5. Dunn® used two groups of eleven fourth-grade pupils in 
language. Approximate equivalence was determined on the basis of 
scores on a standardized language test. The pupils of the supervised 
group were given directions for outlining and for studying by wholes 
rather than by parts during the study period of thirty minutes. The 
control pupils studied in a classroom for the same length of time 


6Minnick, J. H. “An Experiment in the Supervised Study of Mathematics,’’ School Review, 


21:670-675, December, 1913. 
Ibid., p. 675. : } ’ e f A ; 
8White, E. A. “An Experiment in Supervised Study,” Hducational Administration and 
Supervision, 1:257-62, April, 1915. f 
*Tbid., p. 258. ; i 
10Munn, G. A. ‘The Value of Supervised Study,” Teachers College Record, 18:430-4387, 


November, 1917. 
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without such supervision. At the end of the experimental period of 
four weeks, the recitation grades of the pupils were compared, and 
on the basis of this comparison the following conclusion is reported: 
“ | there is a decided difference in the results produced by the 
two different methods of study, and, furthermore, the data would 
suggest that the directed-study period is of vastly more value to 
children than is the undirected-study period.’”* 

6. Heck’? has reported an experiment in arithmetic in which 141 
fifth-, sixth-, seventh-, and eighth-grade pupils participated. The pupils 
were divided into two groups of approximately seventy pupils each. 
The method used is best explained in the words of the experimenter, 
“The half-year grades tested were 5A, 6B, 6A, 7B, and 8... . These 
six grades were divided into two groups, as equally balanced as pos- 
sible; the first group was composed of grades 8, 7B, and 6B, and the 
second group of grades 7A, 6A, and 5A.”"° During the study period the 
pupils of one of these groups worked out the examples in computa- 
tion and reasoning contained in one of the forms of the Courtis Arith- 
metic Tests. The pupils of the other group worked out the same prob- 
lems at home, without help, the same evening. On the following day 
the procedure was reversed for the two groups and another form 
of the same tests was used. The combined results for both groups 
indicated no significant differences between school environment and 
home environment as a factor in study. Somewhat the same procedure 
was employed by the author during an experiment in English com- 
position in which ninety-five high-school students of all classes par- 
ticipated. One group wrote a theme at school; the other wrote a 
theme at home. The groups were reversed, and a second theme was 
written. Again, no significant difference was found between school 
and home environment as a factor in study. 

7. Breed'* directed a cooperative experiment in fourteen schools. 
In each, a group of ninth-grade pupils in a given subject was divided 
into an experimental and a control group on the basis of scores on an 
informal preliminary test, or on previous school marks. Each of the 
pairs of groups was taught by a single teacher. The experimental 
factor was twenty minutes of directed study during the fifty-minute 
divided period and had as its essential element guidance of pupils in 
applying study rules. The supervised-study groups recited for thirty 
minutes, while the unsupervised, or control, groups recited for fifty 


1Dunn, op. cit., p. 487. 
_ Heck, W. H. ‘Comparative Tests of Home Work and School Work,’ Journal of Edu- 
ag sychology, 10:153-62, March, 1919. 
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minutes each day. The control group studied in the traditional fash- 
ion. After six weeks the groups were reversed, and the experiment 
continued another six weeks. The conclusions state that, on the aver- 
age, supervision of study resulted in less efficient learning of algebra 
and English composition, but in more efficient learning of Latin. 
Breslich is substantiated in that supervised study favored the poorer 
students, but hindered the brighter. For this reason, Breed advocates 
a differential plan of study supervision. The results of this compre- 
hensive and fairly well controlled experiment are interesting in that 
they fail to confirm the claims made for supervised study by many 
enthusiasts of that time. 

8. Heckert?® has reported an experiment using a modified double- 
period plan in English composition in which no definite portion of the 
period was devoted to supervised study. Two equivalent groups of 
seventeen pupils each were selected on the basis of their mental-test 
scores and ratings of their compositions. The instructional procedures 
of the supervised-study group were as follows: diagnosis of individual 
difficulties in composition writing, aid in overcoming these difficulties, 
analysis of compositions, instruction in outlining and in the use of out- 
lines, and attempts to arouse enthusiasm for composition writing. 
Although the instructional procedures of the control group are not de- 
scribed, it is possible that they may have included some of the above. 
The statement is made that both groups had the same teacher for the 
recitation, but that the teacher was aided during the study period of 
the supervised group by the author. An attempt was made to arouse 
an equal amount of enthusiasm for composition writing among the 
control pupils who were also prevented from devoting more time to 
study than those of the supervised group, or from receiving assistance 
at home. At the end of twenty-five periods of one hour each the 
achievement of these pupils was again determined by means of com- 
position ratings. The conclusions state that supervised study in Eng- 
lish composition is “eminently worth while” and that “... . under 
fairly skillful direction the brighter children of a supervised class not 
only make better progress than the brighter members of equal ability 
of the unsupervised group but that they also make better progress 
than the slower children of the supervised group.’”'® 

9. Beauchamp” has reported the results of an experiment in physi- 
cal science in which two approximately equivalent groups of twenty- 
" 6Eeckert, J. W. “The Effects of Supervised Study in English Composition,’ Journal of 
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six pupils each were selected on the basis of age, intelligence, reading 
rate, and reading comprehension. Both of the groups received instruc- 
tion in methods of study during the first of six “units” of work. At 
the end of the first unit, which continued for four weeks, a change 
was made in the instructional procedures. The supervised-study 
group received specific instruction in studying a paragraph to de- 
termine its central idea, in finding and answering questions on the 
material assigned, in reading through the entire assignment before 
beginning an analytical study of its parts, and in solving thought 
questions. The unsupervised group continued to use the general meth- 
ods of study suggested to them during the first unit of work. The 
amount of time given to study, the material studied, and the environ- 
ment for the studying were the same for both groups. At the end of 
each unit of work the achievement of the pupils was evaluated by 
means of written reports, completion tests, and thought questions on 
the material covered in the unit. The acquisition of study habits of 
the supervised and unsupervised pupils was estimated by analysis and 
comparison of their study notes. The reading ability of the pupils 
was tested again at the end of the seven months of experimentation. 

The analysis of the study notes of the two groups of pupils showed 
that the pupils of the supervised group had acquired more effective 
habits of study. The other conclusions drawn by Beauchamp are 
given below: : 


1. Specific training in finding the central thought of a paragraph, determin- 
ing the questions one must be able to answer in order to obtain an adequate 
understanding of a topic, and reading an entire block of material through for 
its general plan, results in a more thorough comprehension of the subject-matter 
than undirected study on the same material. 

2. Specific training and practice in answering thought questions based on 
the application of some scientific principle are more efficient than incidental 
training in answering thought questions. 

3. Training the pupil to make various types of analyses of the subject- 
matter increases the ability of the pupil to interpret and reproduce what he 
reads. 

4. The gain in rate of silent reading is greater if the pupil is not required 
to make an analysis of what he reads."* 


10. Brown and Worthington’? have reported the results of a co- 
operative experiment in algebra, English, and United States history in 
which five high schools participated. Seven pairs of equivalent groups, 
varying in size from twenty-three to thirty pupils, were selected on 
the following bases: school subject marks (three pairs of groups) ; 
intelligence quotients (one pair of groups); intelligence scores, ratings 


1SBeauchamp, op. cit., p. 87. 
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on composition and spelling scales, and scores on a standardized 
achievement test (one pair of groups) ; intelligence scores and school- 


Subject. marks (one pair of groups); and intelligence and reading 


scores (one pair of groups). Each of these pairs of groups was taught 
by a single teacher. The supervised study consisted of “directing the 
mental operations, whether they are reciting, being assigned a lesson, 
or working out an assignment” during a sixty-minute period divided 
approximately into “twenty minutes for discussion and review of pre- 
vious work through recitation, examinations, etc.; fifteen minutes for 
assignment of new problem; twenty-five minutes for working out new 
problem.”*° The control group spent fifteen minutes less time per day 
with the teacher, since they met for a forty-five minute recitation. 
Neither group was limited in time that might be spent in study outside 
of class. 

The achievements of the pupils were evaluated in various ways: 
school marks (three pairs of groups); ratings on composition scale 
(one pair of groups); true-false test, general examination prepared 
by teacher, (one pair of groups); standardized test and semester 
marks (one pair of groups) ; and standardized test (one pair of groups). 
At the end of an experimental period of one semester the following 
conclusions were derived from the results: 

pers taut two pairs, the algebra classes in School C and the English 
classes in School D, showed rather definitely that greater progress 
was made in the supervised-study groups; four pairs showed slight 
variation in progress, favorable to supervised study; and one pair, 
the United States history classes in School A, indicated that the 
recitation plan was superior as a method of instruction. 

2. Three supervised-study classes had fewer failures than the parallel 
recitation classes; one had the same number; one had more; and in 
the other two cases the number of failures could not be determined 
from the data submitted. 

3. In general, then, the objective data indicated a superiority of the 
supervised-study plan over the recitation plan as a method of instruc- 
tion. However, the objective data were not conclusive.” 


11. Johnson2? has reported an experiment in eighth-grade arith- 
metic in which two groups approximately equivalent on the basis of 
intelligence scores and past scholastic grades were used. The super- 
vised study was a composite of the following elements: a sixty-min- 
ute period, of which at least twenty-five minutes were given to study 
of the next day’s assignment in the presence of the teacher; ability 
grouping with differentiated assignments in the supervised class; and 
suggestive directions, given in mimeograph form and referred to dur- 


20Brown and Worthington, op. cit., p. 604. 
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22Johnson, A. W. ‘‘The Effectiveness of Directed Study,” Elementary School Journal, 
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ing the study period. The members of the control group were not 
divided on the basis of ability, recited in the traditional manner, and 
studied during the study period or at home. At the end of six weeks 
the groups were reversed and the experiment continued for another 
six weeks. The results are expressed in terms of the number of prob- 
lems attempted and the number of problems correctly solved on tests 
given at intervals of three weeks. The following conclusion is de- 
rived from these findings: “ ... . that in order to train in efficient 
and economical study habits the studying must be done under the 
teacher’s immediate direction.”** 

12. Reeder** has reported an experiment in seventh-grade geogra- 
phy in which two groups of twenty-three pupils each were rotated to 
secure equivalence. In addition, the groups were considered equivalent 
because the pupils had had no previous acquaintance with the text- 
book studied during the experiment. No initial achievement test was 
administered, first, because the pupils had no specific knowledge of 
the subject-matter to be studied on which they could be tested, and 
second, because the administration of an initial test would give the 
control pupils some direction in what to study.*° 

The study of the experimental pupils differed from that of the 
control pupils in that the former received mimeographed sheets of 
study questions with each assignment. These study questions re- 
sembled very much in form the items on various kinds of new-type 
tests. The pupils of both groups studied in the same school environ- 
ment and had access to the textbook only during the study period. 
At the end of two study periods, and after twenty minutes of review 
of the text and study sheets by the experimental pupils, and of the 
text alone by the control pupils, the books and study sheets were 
collected and a true-false test administered. The groups were rotated 
at the end of the week?** and the experiment continued so that the 
former unsupervised group received supervision and the former super- 
vised group went unsupervised. The results were combined for two 
week units. That is to say, the means of the scores on the final true- 
false tests for the unsupervised weeks of both groups were added to- 
gether, and the result subtracted from the combined means for the 
supervised weeks of both groups. The duration of the entire experi- 
ment at the Speyer School was six weeks. Reeder supplemented his 
results with similar experiments at other schools, one of which lasted 


*3Johnson, op. cit., p. 135. 
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six weeks, and three which lasted but two. The data were interpreted 
for each of the two week units of all of the experiments in terms 
of the experimental coefficient, and summaries are given for the re- 
sults at different schools and for all of the experimentation. Reeder 
reports a final experimental coefficient of 1.62 indicating that the 
chances in favor of the superiority of his method of supervision as 
compared with its absence are as 65,000 to 1. 

13. Douglass*® has reported an experiment in which ten pairs of 
groups, averaging fourteen pupils each, were selected and carefully 
paired on the basis of age and a composite of intelligence test score and 
achievement test score. Some of the groups were taught by members 
of the regular high-school staff which consisted “largely of mature, 
progressive, and somewhat superior teachers.’?? The majority of the 
groups were taught by practice teachers. The author claims that 
there was nothing about the teaching staff, the personnel of the groups 
used in the experiment, or the equipment of the school that would 
render the results not typical of those which could be obtained in the 
ordinary high school. 

The problem was the determination of the relative effectiveness of 
the study-recite sequence in supervised study as compared with the 
recite-study sequence. The pupils of each pair of groups were sub- 
jected to the same supervised-study procedures during the portion of 
the period devoted to study. They had the same instructional pro- 
cedures and materials during the recitation. The only difference was 
that in one group the pupils recited and then studied their lesson 
for the next day, while in the other group they studied the lesson 
first, and followed their study with immediate recitation. In order 
to equate teacher factors and those of room environment, teachers 
and rooms were exchanged at the mid-point of the experiment. At the 
end of eleven weeks, final achievement tests were administered. These 
tests were similar in form and content to the subject-matter tests 
adminstered at the beginning of the experiment, and in one case the 
same test was repeated. The coefficients of reliability were determined 
for all of the tests used and were found to range from .614 to .943. 
In the interpretation of data, appropriate statistical procedures were 
used. The following conclusions are reported in the monograph: 


1. It cannot be said that either an R-S or an S-R sequence is more 
effective than the other for all classes or types of work. 


26Douglass, H. R. ‘‘The Experimental Comparison of the Relative Effectiveness of Two 
Sequences a Supervised Study,” University of Oregon Publications, Vol. 1, No. 4. Eugene: 
University of Oregon, 1927, p. 173-218. 
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2. The S-R sequence is more effective in classes in history and social 
science and in literature than the R-S sequence. 
3. The R-S sequence is more effective in classes in mathematics and 
science, though the superiority may not be manifest in each class. 
4. The relative effectiveness of the S-R sequence for history and social 
science and English classes is greater for classes in grades above the 
eighth than for seventh and eighth grade classes. 

5. Teachers’ judgments of the relative effectiveness of two methods 
are not to be taken too seriously. 

6. Neither sequence operates to produce greater variability in progress 
than the other, generally or in any particular subject or school grade. 

7. Neither séquence is peculiarly favorable for more capable or for 
the less capable student. 


Evaluation of the techniques employed in the conduct of the ex- 
periments in supervised study. 1. Specification of supervised study 
as an experimental factor. There has been little agreement among the 
experimenters in this field as to just what particular procedures con- 
stitute supervised study. An examination of the reports shows that 
some have emphasized guidance of pupils while studying as the chief 
procedure. An example of this is to be found in the experiment of 
Breslich. The teacher passed about the room watching pupils at work, 
making suggestions but rarely answering questions directly, giving no 
help until a serious effort had been made by the pupil, stopping the 
whole class when mistakes were discovered which might become gen- 
eral, and adapting the guidance to meet the needs of the moment or 
the individual.?® Other experimenters have emphasized instruction in 
the techniques of learning, or methods of study as the chief procedure 
of supervised study. Beauchamp gave the experimental group train- 
ing in determining the central idea of a paragraph, in organizing their 
thinking about the central idea, in finding and answering questions 
relative to the material assigned, in reading through a whole assign- 
ment before beginning an analytical study of its parts. and in solving 
thought questions.*° 

In many of the reports of the experiments in this field the deserip- 
tions of the procedures used are inadequate. For example, instruction 
in methods of study was the chief procedure employed by the teachers 
in the cooperative experiment of Breed, but the report does not present 
information with respect to just what these methods were other than 
that they were taken from Whipple.*t The results of the different 
experiments may not be compared, or a general conclusion to the ex- 
perimentation in this field synthesized, because of the variety, com- 
plexity, and inadequacy of description of the experimental factors. In 


*8Douglass, op. cit., p. 218. 
?°Breslich, op. cit., p. 508. 
*%°Beauchamp, op, cit., p. 49. 
‘1Breed, op. cit., p. 196. 
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many cases it is impossible to ascribe the results obtained to specific 
supervised-study procedures. 

2. Equivalence of groups. A variety of techniques were employed 
in the controlled supervised- study experiments in an effort to secure 
equivalent groups. These techniques range from shifting pupils so 
that the average of the two groups were as nearly the same as possible 
with respect to school-subject. marks of the preceding semester, or 
scores on an informal preliminary test, in the experiment of Breed®2 to 
the pairing of pupils on the basis of chronological age and a composite 
of regressed intelligence and achievement test scores in the experiment 
of Douglass.** Beauchamp recognized such criteria as intelligence test 
scores and scores on informal reading rate and comprehension tests in 
showing the equivalence of his groups.** Reeder assumed that his 
groups were equivalent because they were rotated at the mid-point of 
the experiment and because the pupils had no specific initial knowl- 
edge of the subject-matter to be studied. 

Beauchamp is the only experimenter who came near to recognizing 
the importance of study habits as one of the criteria of equivalence.*° 
Douglass*® approached closest to the requirements of a precise experi- 
ment in that he made use of three of the criteria, or characteristics of 
pupil material, and did so on the basis of regressed scores.*’7 None of 
the experimenters paired pupils on the basis of intelligence test scores 
and later checked the equivalence thus secured with respect to pre- 
vious achievement, chronological age, study habits, personality traits, 
physical condition, sex, and race in the manner suggested in the dis- 
cussion of the control of pupil factors. It is probable that the earlier 
experiments should not be condemned too severely for this, since these 
techniques had not been developed. 

3. Control of teacher factors. Some of the techniques employed 
for the control of instructional procedures are excellent. For example, 
Breed provided the teachers who participated in his experiment with 
mimeographed copies of the general directions to be followed in the 
conduct of the experiment.** Douglass insured a clear understanding 
on the part of the teachers of the procedures to be employed by hold- 
ing a meeting at which the procedures were explained and questions 
relative to them answered.*® The teachers were also given a summary 
of instructions for the conduct of the experiment and were asked to 

82Breed, op. cit., p. 192. 

33Douglass, op. cit., p. 177-83. 

34Beauchamp, op. cit., p. 50-54, 
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keep an experimental log in which was kept a record of all absences, 
interruptions of class work, and special distractions which might in- 
fluence the progress of the pupils. Reeder*® eliminated instructional 
procedures by administering supervision of study in the form of mim- 
eographed sheets of study questions with each assignment. The con- 
trol pupils, of course, did not receive these study questions.** An 
attempt to control zeal and effort is indicated in the following quota- 
tion from Heckert: 


In both groups, the teacher attempted to arouse as much enthusiasm for 
the work as possible in order that every child might do his best. In this she 
succeeded for we never had keener interest in the writing of compositions.” 


There is abundant evidence of failure to control certain teacher 
factors in many of the experiments. In the experiment of Heckert** 
the teacher was assisted by the experimenter in the administration of 
the supervised-study procedures. It is logical to assume that the 
pupils were stimulated not only by the supervised-study procedures, 
but also by the superior skill and presence of the experimenter. 
Among the directions given the cooperating principals by Breed is the 
following: 


Select for the supervision of study teachers who are known to be interested 
(a) in undertaking the experiment, and (b) in teaching pupils how to study.* 


This suggestion could not help but introduce faulty control of the 
teacher factor of zeal and perhaps also of skill. The teachers were 
selected for their bias in favor of the experimental factor. The criti- 
cal reader secures the impression from a careful examination of the 
reports of these experiments that the zeal of the teachers for the 
novel supervised-study procedures was in most cases an uncontrolled 
factor of sufficient influence to produce the apparent superiority of 
the experimental factor. 

While the teachers in the experiment of Breed were selected for 
their interest in supervised study, there is no other evidence, in this 
case, that they were not typical of high-school teachers in general. 
The teachers of Douglass’ experiment do not seem to be representative 
of the profession, since they were either practice teachers or regular 
members of the staff of a university high school.*® The same state- 
ment may be made with respect to the experiments of Breslich and 
Beauchamp, although no practice teachers were used. In the rest of 
the reports the information concerning teachers is too meager to per- 
mit judgments relative to the representativeness of the teachers. 
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4. Control of general and extra-school factors. Evidence is pre- 
sented in the reports of Reeder and Douglass to show that care was 
taken to see that the experimental and control pupils had identical 
materials of instruction. For example, Douglass states in the instruc- 
tions to teachers: ‘Not only should the two sections cover the same 
material during the experimental period, but they should be kept 
together and should progress at the same rate.’** Reeder permitted 
the experimental and control pupils to have access to the textbook 
only during the study periods.** It is probable that some of the other 
experimenters exercised similar care, although no information was 
given to prove that this was the case. The failure to have both groups 
of pupils study in comparable*® environments, as is evident in the ex- 
periments of Breed,*® Breslich,®° Brown and Worthington,*! Johnson,” 
and Minnick,’* is likely to have caused variation in this factor. The 
experimenters just referred to are to be criticized for not controlling 
adequately the factor of materials of instruction and for failure to 
eliminate the general and extra-school factors inherent in non-com- 
parable study environments. 

The study environment of the experimental pupils, in that they 
studied in the presence of their teacher, was not comparable, in other 
respects, to the study environments of the control pupils who studied 
in a study hall, or at home. The experimental pupils probably were 
stimulated by the mere presence of the teacher. The control pupils 
who studied in a study hall had this stimulation to a lesser amount, 
but the control pupils who studied at home had it, or didn’t have it, 
according to the character of their parents. Another aspect of the 
failure to control the study environment is its effect on the time 
factor. Where the experimental and control pupils studied in com- 
parable environments, the time factor was also controlled. In the 
experiments of Beauchamp,°* Douglass,** Dunn,°* Heckert,” and 
Reeder®® the same amount of time was set aside each day for study 
on the part of the experimental and control pupils. In the experiment 


46Douglass, op. cit., a 
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of Heckert®® the control children studied at home, but care was taken 
to prevent them from spending more time at study than the super- 
vised pupils. In the experiment of Reeder® both groups studied for 
the same amount of time at school. Additional study was prevented 
by collection of the textbooks at the close of the study period. With 
the possible exception of the experiment of Reeder, one cannot be 
sure that the members of the supervised group did not do some of the 
studying outside of the classroom. In the experiment of Johnson the 
failure to control a factor of school organization is indicated by the 
information that the experimental pupils were divided into ability 
groups, while the control pupils were not.** 

Failure to control the general and extra-school factors satisfac- 
torily might be justified by the experimenters on the grounds that 
they were comparing supervised-study procedures with the traditional 
absence of supervision and, for this reason, should not have had the 
control pupils study anywhere else but in the study hall or at home. 
There is some justification in this view, since the conditions of ex- 
perimentation are more comparable to ordinary school conditions. 
However, such conclusions are not as valuable as results secured 
under more precise control of conditions. In the precise experiment 
the results may be ascribed to the supervised-study procedures alone, 
while in the experiments described as lacking this precision, the differ- 
ence in gains must be ascribed to a complex of factors including super- 
vised-study procedures, instructional materials, presence of the teacher, 
and home conditions. 

Evidence has already been presented of failure to control im- 
portant non-experimental factors. For example, Heckert did not 
recognize that he was introducing an uncontrolled experimental factor 
by participating in the instruction of the supervised pupils. Johnson 
did not recognize the presence of an uncontrolled factor when the 
experimental group was divided into ability groups, while the control 
group was not. Minnick is to be criticized for permitting his con- 
trol pupils to consult the teacher during his conference hours, thus 
obtaining a measure of supervision and at the same time introducing 
an unrecognized uncontrolled factor. The experimenters who failed to 
control the time factor also failed to recognize its importance in 
interpreting the data. It has already been stated that failure to con- 
trol such important teacher factors as skill and zeal is evident “be- 
tween the lines” in many of the reports of the experiments. None of 
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the experimenters recognized that the superiority shown for the super- 
vised-study procedures may have been due to the enthusiasm of the 
teacher for new procedures.*? None of the experimenters can be credited 
with recognizing lack-of-control to an adequate extent. 

In view of the evidence the potency of the educative factors 
designated as “important,’’®* especially the skill and zeal of the 
teacher with reference to the experimental factor, it seems reasonable 
to say that failure to control completely the non-experimental fac- 
tors should have received explicit recognition in interpreting the dif- 
ference in gains in each of the experiments in this group. 

5. Measurement and interpretation of differences in gains in 
achtevement and in the acquisition of study habits. A variety of tests 
were used to measure achievement in the school subjects in which 
supervised study was tried as an experimental factor. Brown and 
Worthington, Heck, Heckert, and Douglass were the only experi- 
menters to use standardized tests, or tests of known reliability.® 
Beauchamp and Reeder used new-type tests of their own construction 
but made no effort to show that these tests were reliable. The rest 
of the experimenters depended on ordinary school marks on recita- 
tions, monthly quizzes, or traditional examinations. 

Earhart and Beauchamp were the only experimenters to attempt 
a direct measurement of study habits. The former administered 
sample lessons with questions relative to precedures that would be 
employed by the child in studying them; the latter made an examina- 
tion of the study notes of his supervised and unsupervised pupils. 
Breed, Breslich, and Johnson sought to measure the acquisition of 
study habits by measurement of achievement after rotation on the 
assumption that retention of superiority by the former supervised 
pupils would indicate acquisition of good study habits. Minnick 
measured the ability to solve new problems for the same purpose. 
Brown and Worthington, Douglass, Dunn, Earhart, Heck, Heckert, 
and White made no effort to measure the acquisition of study habits. 

None of the experimenters made use of the formulae® that yield 
indices of the variable errors of measurement. Reeder®* and Doug- 
lass*? made use of the formula*® that gives the combined allowance for 
variable errors of measurement and for sampling and may be credited 
er i ee poate nies. of 
pupil achievement. However, he is comparing two procedures for which teachers have not 
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with having recognized and allowed for the variable errors of measure- 
ment, even though they did not differentiate the variable errors of 
measurement from those of sampling. In none of the other experi- 
ments was any attention given to the variable errors of measurement. 
It is probable that the earlier experimenters should not be criticized 
severely for this omission since the formulae may not have been 
accessible to them. It is certain, however, that the findings of most 
of the experiments in supervised study are less dependable because of 
the neglect to minimize, or account for, variable errors of measure- 
ment. 

Douglass is the only experimenter in this field who has included 
reference to systematic errors in his report. He states: 


Those familiar with statistical procedure realize that the obtained gains 
from studies of this sort may be the results of a real difference in the experi- 
mental factors, chance errors of measurement and of sampling, or systematic 
errors from those sources such as might occur in the selection of experimental 
groups or in the failure to rule out or hold constant some factor favoring one 
group or the other. As has been pointed out, great care was exercised to pre- 
vent the operation of such systematic errors.” 


No mention is made in the reports of any of the experiments con- 
cerning the precautions taken to insure identical testing conditions, 
unless the above quotation of Douglass is taken to mean that he main- 
tained such conditions. The neglect to mention such precautions leads 
to the inference that identical testing conditions were not maintained 
in the other experiments. The probable consequence is that unrec- 
ognized systematic errors of measurement have influenced the results 
in unknown, though probably small, amounts. 

No attention was given by any of the experimenters to variable or 
systematic errors of validity. The significance of this fact becomes 
apparent when we note that with the exception of the experiments of 
Harhart and Beauchamp no attempt was made to measure the acquisi- 
tion of study habits directly. Breed, Breslich, Johnson, and Minnick 
sought to measure the acquisition of these abilities indirectly, but it is 
reasonably certain that such indirect measures were grossly lacking in 
validity.”? If the merit of supervised study is judged, as in most of 
the experiments described, merely on the basis of differences in school 
achievement in the immediate presence or absence of supervised study, 
and no account is taken of differences in acquisition or retention of 
study habits, or of the differences in school achievement long after 
supervision has been removed, the conclusion arrived at is likely to be 
unjust. Systematic errors of validity of this nature have not been 
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adequately accounted for in these supervised-study experiments and 
because of this fault, alone, many of the conclusions seem unde- 
pendable. 

6. Generalization. No evidence is given in the reports of the ex- 
periments to show that the groups were selected in a random fashion 
from the population of school children to which the generalizations 
are applied. It is evident from an examination of the reports that all 
the groups were ordinary school classes; therefore, whatever non- 
representativeness was present may not be ascribed to random sam- 
pling. It is probable that the groups used by Breed and by Brown and 
Worthington may be considered more typical of high-school children 
in general than the groups used in any of the other experiments. 
Fourteen high schools in cities of various sizes participated in the ex- 
periment of Breed,”! while four high schools in cities of moderate size 
cooperated in the experiment of Brown and Worthington.72 The ex- 
periments of Breslich,“* Beauchamp,’* and Douglass were conducted 
in a university high school. Douglass sought to show that in spite of 
the character of the high school in which his experiment took place, 
the pupils were typical. He states: 

This school is a laboratory for the school of education of the University. 
It has the six-year type of organization, comprising grades seven through twelve. 
Pupils are accepted for registration on the basis of priority of application. 
Great effort is expended to maintain a representative student personnel. Not 
more than two-thirds of any one grade may be of one sex. Athletics and other 
activities are maintained with a view to attracting a representative group of 
young people. The entrance requirements are identical with the junior and 
senior high schools of Eugene. The school is in no way a special preparatory 


school for the University, and the average age and range of age-ability is ap- 
proximately equal to that of the typical Oregon high school.” 


Douglass is to be highly commended for presenting this informa- 
tion in the report of his experiment, but the authors of this study can- 
not feel that he is quite justified in making the following statement: 


There is nothing about the student or teaching personnel or about the 
equipment of the school which would give ground for a belief that whatever 
experimental results have been obtained could not be expected to be typical 
of results obtained in the ordinary high school of over two or three teachers.” 


The fact that the pupils had to apply for admission, pay fees, and 
attend school in proximity to a university is sufficient to render them 
somewhat non-representative of high-school pupils in general. Such 
environmental influences could not help but engender attitudes toward 
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school work that would be different from those of the typical high- 
school pupil. 

The evidence presented in the first of the quotations of Douglass 
is the most complete that has been given in any of the experiments 
relative to the degree of representativeness of the pupils. Most of the 
experimenters have ignored this important aspect of experimentation 
altogether. 

Douglass and Reeder sought to allow for non-representativeness of 
the groups of pupils by the use of formulae. It is probable that they 
were partially justified in using these formulae in that recognition was 
thus given, unknowingly, to variable errors of measurement.’* They 
do not seem to have been justified in using these formulae as a means 
of allowing for non-representativeness, because their groups were not 
selected in a random fashion. In each case they were ordinary school 
classes. Douglass presents arguments to prove that his groups were 
representative. If they were representative, then the use of the for- 
mulae was futile, since the means of representative groups are the 
same as the populations from which they are drawn. If the groups 
were not representative, and we have reason to believe they were not, 
Douglass should have estimated the effect of non-representativeness 
on his gains and restricted his conclusions accordingly. Reeder and 
all of the other experimenters should have done the same thing. 

The conclusions quoted in the foregoing descriptions of the experi- 
ments in supervised study are, for the most part, stated as generali- 
zations. The previous discussion of the shortcomings of these ex- 
périments indicates how undependable the data are on which these 
generalizations are based. None of the experimenters made use of 
random or truly representative groups, and yet none of the conclusions 
have been more than slightly restricted because of this. 


The dependability of generalizations found in the reports of the 
controlled experiments to determine the merit of supervised study. 
An examination of the achievement differences reported in the con- 
trolled experiments to determine the merit of supervised study reveals 
wide variation from positive results in its favor to negative results in 
opposition. The majority of differences in achievement seem to favor 
the superiority of supervised study. Some of these positive differences 
are so large that it would seem probable that if allowances had been 
made for variable errors of measurement, validity, and sampling, the 


Tt has been shown that the error of difference formulae in which oes is used to de- 


termine the values inserted under the radical recognizes both errors of sampling and errors 
of measurement. See p. 72, 
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chances would yet be strongly in favor of the significance of the dif- 
ference. However, many of these highly positive results in favor of 
supervised study seem questionable. It is all too evident that the 
majority of the experimenters, or the teachers used by them, were 
zealous for the experimental factor. In many cases, it is obvious that 
the pupils of the supervised groups received not only instruction in 
how to study, but much more instruction in the subject-matter than 
the unsupervised pupils. The previous discussion of the techniques 
employed in the experiments described substantiates the contention 
that the reported differences are decidedly questionable. In certain 
experiments the negative differences reported after rotation would also 
seem to favor supervised study. The former supervised pupils re- 
tained some of the habits acquired when supervised study was applied 
to them, and this retention was effective in reducing the difference in 
achievement after rotation. 

The differences reported in the experiments of Breed’? and Brown 
and Worthington* are for the most part small and are approximately 
equally divided for and against supervised study. The negative dif- 
ferences are difficult to explain. An examination of the report of the 
experiment of Brown and Worthington showed that the negative dif- 
ferences were reported only for the groups in which intelligence tests 
were not used to secure equivalence. Where intelligence tests were 
used, the difference was in favor of supervised study. The groups used 
in the experiment of Breed were equated merely on the basis of marks 
on informal preliminary tests prepared by the cooperating teachers or 
on the basis of their previous semester grades in the school subject. 
The negative differences may have been due to failure to secure 
equivalence. 

The generalizations based on such differences cannot be more de- 
pendable than the differences themselves. Conclusions favorable to 
supervised study are not applicable to other schools or classes, because 
one cannot be sure that the difference in achievement was due to 
supervised study and not something else. Conclusions unfavorable to 
supervised study do not seem necessarily a condemnation of it, since 
it is evident that faulty equivalence may have been responsible for the 
negative differences. Since the conclusions and generalizations of the 
individual experiments are of this nature, it is impossible to synthesize 
a general conclusion to all of the experiments. Such a synthesis might 
well be a summation of errors rather than an approach to truth. 


Breed, op. cit., p. 281-83. ' 
Brown and Worthington, op. cit., p. 605-9. 
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Concluding statement. The preceding critical evaluation of ex- 
periments relating to supervised study has revealed the very meager 
contributions that these studies have produced. In every case a 
critical examination of the experimental procedure has revealed faults 
that make the interpretation of the obtained difference in gains un- 
certain. If we assume that it is desirable for pupils to conform to 
certain study procedures and that the controls of conduct which will 
insure the desired conformity can be acquired only as the result of 
instruction, there arises the problem of determining the most effective - 
plan for giving this instruction. There are several possible plans: 
supervised study, special course in how to study, distribution of 
printed or mimeographed rules and directions for study, suggestions 
and directions given orally in connection with assignments, and inci- 
dental instruction. The problem to be attacked is that of determining 
the relative effectiveness of supervised study in comparison with other 
promising plans. Unless we assume that supervised study may not 
be pedagogically sound, an experiment to determine the relative ef- 
fectiveness of supervised study and no instruction relative to study 
procedures is an attempt to prove the obvious. To make this assump- 
tion seems absurd. Hence, with the exception of the work done by 
Douglass the supervised-study experiments evaluated in the preceding 
pages have been attempts to prove the obvious. Their principle value 
has been the training of the participating teachers in certain proce- 
dures for engendering study habits and the stimulation of interest in 
supervised study. If the experiments had been planned so as to com- 
pare two plans of instruction designed to engender study habits, they 
might have added to our knowledge about how to instruct pupils 
in regard to methods of study. 


2 


CHAPTER V 


EXPERIMENTATION AS A PROCEDURE IN 
EDUCATIONAL RESEARCH 


Is educational research in a plateau period? In Chapter I the de- 
velopment of experimentation in the field of education was traced 
briefly, and statements by a number of writers were quoted as evi- 
dence of a rather general belief that this type of educational research 
has not yielded much in the form of dependable conclusions about the 
effectiveness of educational procedures. In commenting on these quo- 
tations, the suggestion was made that, perhaps, educational: experi- 
mentation has reached a plateau period of development. The ex- 
position of the procedure of experimentation in Chapters II and IIT 
and the evaluation of a group of controlled experiments in Chapter 
IV provide a partial basis for an answer to the implied question. Be- 
fore setting forth the judgment of the present writers, statements from 
a few recent writers will be noted. 

Whipple states in regard to experimental techniques “ ... . that 
it is only recently that research in education has arrived at the em- 
ployment of some of the most obvious principles of scientific pro- 
cedure.’* Gates and Barr in the following quotations seem to be- 
lieve that progress is beginning again because of more perfect tech- 
niques: 

Three years ago Dean Henmon and others deplored what then appeared 
to be a serious neglect of experimental studies of the learning and teaching 
process. It is gratifying to say that since that time scientific workers have 
shown a renewed activity in these fields of research. .... Within the last two 
years there have been gratifying advances in the study of the principles under- 
lying efficiency in learning!’ 

The experimental study of education is passing out of the play, manipu- 
lative, or exploratory state. Better acquaintance with experimental methods 
should bring better research. It seems that there are a number of practices 
found in reports of careful research workers that should enjoy more general 


acceptance.’ 

It is also the opinion of the writer that the quality and the accuracy of 
statistical writing have improved very greatly in the last few years. One has 
only to examine the numbers of an educational journal five years ago and 
compare them with those of the present year to be convinced of this.* 


1Whipple, G. M. “The Improvement of Educational Research,’’ School and Society, 


26:252, August 27, 1927. Y , 
2Gates, " T. “Recent Advances in Educational Psychology,” School and Society, 29:2, 


3, January 5, 1929. Se in 
An Praheeg before the American Association for the Advancement of Science, Section 


tion, D ber 29, 1928. ; ; 
ea ete Restarch,'! Journal of Educational Research, 19:56, January, 1929. (An edi- 


torial.) (The editorial goes on to point out “the more obvious practices of careful research 


ee poisccee, K. J. “Accuracy in Calculation,” The Elementary School Journal, 29:516, 


March, 1929. 
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The number of such statements is less than that of the critical 
statements quoted on pages 13-15. This suggests that there is no gen- 
eral agreement in regard to the future of experimentation in the field of 
education. To the present writers it appears reasonably certain that 
numerous refinements of technique will be introduced and that, con- 
sequently, the quality of experimental research will be greatly im- 
proved. The ultimate status is another question. Before venturing a 
prediction, it is desirable to point out certain inherent limitations of 
the experimental method and certain crucial difficulties. 

Inherent limitations of the experimental method. Experimenta- 
tion in common with other methods of research can never tell us what 
should be. The object of all research is to test ideas or hypotheses. 
Experimentation may tell us, perhaps, which of two methods is the 
better if certain criteria are assumed. Philosophy rather than ex- 
perimentation must be used to tell whether or not a method should 
be used at all. The appropriateness of a method depends on the 
character of the individuals desired as a result of the educative 
process. For example, research cannot tell us whether or not 
gifted children should be educated differently than other children until 
it is decided whether or not it is desirable to have them different. 
Such questions are matters of value and belong to the field of philoso- 
phy rather than to science. 

Another limitation of experimentation is that final answers to 
educational problems may not be desirable. It is evident to anyone 
who has given the matter critical thought that even the most perfect 
of learning experimentation, giving the most conclusive results, and 
considered as having solved the problem, may yet be detrimental when 
applied to practice. Teaching is a dynamic process, and if the teacher 
is satisfied to apply an experimentally determined “best”? method, 
year after year and without change, it appears reasonable that in 
time its effectiveness will be lowered below that of other methods 
experimentally proven inferior, but which would be used with greater 
enthusiasm. 

Another weakness of experimentation, though perhaps of less in- 
herent nature, is that the results obtained apply to the typical rather 
than to the atypical child. It may be said that group experimentation 
tells us what to do for the child who needs help least. The emphasis 
on averages causes us to tend to neglect the problems of the indi- 
vidual. The feeling that conclusive results may be obtained only with 
large groups have led us to neglect the study of the individuals who 
make up these groups. The careful observation of a single individual 
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may yield more valuable knowledge in regard to the phenomena of 
learning than hundreds of carelessly performed experiments on large 
groups. A hundred individuals may be just as unrepresentative a 
sample of all individuals as a single one. 

Crucial difficulties in educational experimentation. Each item in 
the procedure of experimentation in the field of education involves a 
difficulty to be overcome, but certain difficulties appear to be much 
more serious than others. The present writers believe that four may 
appropriately be designated as crucial. In giving this designation, 
they recognize that the seriousness of these difficulties varies, but in 
general the designation appears to be justified. 

1. Definition of experimental factor and adjustment of other ed- 
ucative factors to it. Precise definition of the experimental factor is. 
essential in order to give definite meaning to the findings. It cannot 
be very meaningful to prove that Method A is superior to Method B 
if the investigator can define these methods only by saying that they 
are the methods carried out in the experiment. Precise definition is 
not always easy, but reasonable satisfactory statements may be se- 
cured by specifying in writing the details of the experimental pro- 
cedure prior to the beginning of the experiment or by keeping a de- 
tailed record of the procedure actually carried out. 

There is, however, another aspect of this difficulty. In order to 
secure results that have a maximum of practical value, it is not suffi- 
cient merely to make a precise specification of the experimental factor. 
The precedure specified must be one that is adjusted to other educative 
factors in a way that is compatible with sound educational practice. 
This means that the combination of all educative procedures, both 
experimental and non-experimental, must be one that is effective. For 
example, consider supervised study as an experimental factor. It is 
reasonable to assume that the maximum effectiveness of this instruc- 
tional procedure will be attained only when it is combined with certain 
types of assignments, recitational activities, and perhaps textbooks. 
In other words, the effectiveness of supervised study depends upon the 
other instructional procedures with which it is combined and not 
merely upon the techniques of which it consists. 

A more striking illustration of this aspect of the difficulty is 
afforded by an experiment to determine the effect of class size upon 
pupil achievement. It appears reasonable to say that the teaching 
of a class of fifty pupils should involve procedures that differ in some 
respects from those that are most effective with classes of fifteen to 
twenty pupils. Hence when size of class is made the experimental fac- 
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tor, it is necessary to have the accompanying instructional procedures 
adjusted to the size of the particular classes included in the experiment. 
If this is not done, the experimental factor will not be tested out under 
optimum conditions, and the findings will have only limited signifi- 
cance. The important problem relating to the effect of class size is, 
‘What will be the effect of organizing a school into large classes 
rather than into small classes?” rather than, “What is the effect of 
a few large classes that are balanced by small classes?” In order to 
have typical large-class conditions, it is necessary to have a school 
organized into large classes, and this in turn will mean a large pupil- 
teacher ratio unless the number of classes per teacher is reduced in 
proportion. Similarly, typical small-class conditions will involve a 
small pupil-teacher ratio unless the number of classes per teacher is 
increased in proportion. As usually thought of, large classes 
are understood to mean a large pupil-teacher ratio and small classes, 
a small pupil-teacher ratio. Hence in setting up a class-size experi- 
ment, it is not sufficient merely to organize a few large classes and a 
corresponding number of small classes. The pupil-teacher ratio must 
be comparable to the size of class. 

2. Control of non-experimental factors. The more important edu- 
cative factors were identified in Chapter II, and the difficulty of con- 
trolling them, especially the less tangible ones such as teacher zeal, 
is so apparent that an extended discussion is unnecessary here. As 
pointed out on pages 52-56, control of certain factors has been at- 
tempted by employing the rotation method. When employing this 
method, as well as other procedures for securing control, it is important 
to make certain that the total instructional situation is compatible with 
sound educational practice. This requirement is not always easy to 
satisfy. For example, it is doubtful whether the rotation method is 
appropriate when the total experimental period is less than two terms 
or semesters. Certainly, it is not compatible with typical educational 
practice to rotate the teachers when the total period is only a few 
weeks. The control of instructional techniques by detailed directions, 
which are to be followed rigidly, may lead to teaching that is not 
compatible with sound educational practice because in good teaching 
there must be adaptation of techniques to the needs and purposes of 
pupils as they become apparent. 

As pointed out on pages 54-55, attempts to control teacher factors 
by having each teacher instruct an experimental group and a control 
group will not always be successful. If the requirement of compati- 
bility with sound educational practice is observed, it is not likely 
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that this procedure will result in control of zeal and effort, which 
were shown to be important educative factors. 

It should be noted that control may be secured by measuring the 
differences that exist and allowing for them in interpreting the differ- 
ence in the gains in achievement. Although this method will seldom 
secure precise control, because we lack instruments for measuring most 
of the important educative factors, it is advisable to keep a detailed log 
of the experiment and to note in this any observed differences. By 
so doing, a critical investigator will usually be able to avoid over- 
looking gross failures to control important educative factors. 

3. Measurement of achievement. The general difficulty of securing 
reliable and valid measures of achievements in experimental investi- 
gations was discussed on pages 61-69, and little more needs to be 
said here. In order to understand the seriousness of this difficulty, it 
is necessary to bear in mind that the problem being studied specifies 
either explicitly or implicitly the achievement to be measured, and fre- 
quently the specifications include a number of relatively subtle ele- 
ments of achievement. For example, if the project method is made 
the experimental factor, the claims made for this method by its ad- 
vocates imply that the outcomes include such general patterns of 
conduct as initiative, resourcefulness, persistence, and interest in 
school work. In many experiments the quality of permanency of 
achievement is implied. For example, in reviewing the experiments on 
supervised study, it was pointed out that permanency of the 
study habits should be considered by ascertaining if they functioned 
in future study. 

4. Generalizing. In order to generalize from the results of an 
experiment, it is necessary to have some index of the degree to which 
the pupils included in the experiment are representative of the larger 
population for which it is desired to state conclusions. If it can be 
demonstrated that the group of pupils included in the experiment is 
representative of the larger population, the investigator may state his 
conclusions as generalizations. If it is known that the group of 
pupils included in the experiment constitutes a random sample of the 
larger population, formulae are available for calculating the allow- 
ance that must be made for the operation of chance. Unfortunately 
the group of pupils available for experimental purposes can seldom 
be selected by a process of random sampling, and usually it is not 
possible to prove them highly representative. Hence, the experimenter 
faces the task of generalizing from data secured from a group of 
pupils whose degree of representativeness is not known in any pre- 


cise way. 
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In generalizing, it is necessary to consider also the representative- 
ness of the total instructional situation. For example, if a class-size 
experiment is organized so that each teacher has a large class and 
a small class, the experimenter is justified in generalizing only for 
such teaching situations. 

Controlled experimentation versus informal experimentation. In 
attempting to evaluate educational experimentation as a research 
procedure, it is important to distinguish between the types commonly 
designated as controlled and informal experimentation. Controlled ex- 
perimentation involves careful control of all non-experimental factors 
and is designed to lead to relatively precise and dependable results. 
In other words the conclusion is expected to be a definite statement 
of the relative merits of the educative procedures compared. Since 
this conclusion is based on objective data it is expected to be depend- 
able and final, at least within the defined limits of the investigation. 

Informal experimentation may be thought of as the trying out of 
an educative procedure to ascertain whether it works. Fundamentally 
it differs from controlled experimentation in the degree of refinement 
of the experimental procedure. For example, a teacher who tries out 
a new textbook controls other educative factors only to a very limited 
extent. The control group may be a class or classes taught during 
a previous term, and pupil achievement is measured in terms of 
teacher estimates or grades made on the final examination. The 
teacher may conclude that the textbook is unsatisfactory, but this 
conclusion cannot be regarded as a demonstrated fact. 

Informal experimentation is frequently profitable. The teacher 
and others connected with the experimentation are usually stimulated 
and the findings may be in the direction of truth. But such investi- 
gations can not be expected to contribute to our fund of scientific 
knowledge relating to education. 

The outlook. When we consider the crucial difficulties encountered 
in experimental investigations, it is difficult to be very optimistic 
in regard to the improvement of research procedures so that the find- 
ings will be highly dependable. As we have indicated, there is evi- 
dence that experimental techniques are being improved, and it is 
possible to present a strong case in support of the statement that we 
are leaving the plateau period. It is more difficult to predict the 
future, but it seems doubtful whether we are justified in expecting 
that in time it will be possible to set up an experiment or a eroup 
of. experiments that will yield definite and final answers to any 
question concerning the relative merits of a given educative factor. 
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Some questions can be answered satisfactorily. A few have been 
answered. But for many questions, perhaps most questions, it is 
likely that we are not justified in expecting more than an “indication.” 

Controlled experimentation, however, is worthwhile. In addition 
to the dependable information that may be contributed, there are 
valuable by-products. Experimental investigations are stimulative. 
Experiments with the project method have stimulated a greater in- 
terest in this instructional procedure, and we know that under certain 
conditions it works. 


